From PATBYRNE at uk.ibm.com Thu Oct 1 11:09:29 2015 From: PATBYRNE at uk.ibm.com (Patrick Byrne) Date: Thu, 1 Oct 2015 10:09:29 +0000 Subject: [gpfsug-discuss] Problem Determination Message-ID: <201510011010.t91AASm6029240@d06av08.portsmouth.uk.ibm.com> An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu Oct 1 13:39:25 2015 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 1 Oct 2015 12:39:25 +0000 Subject: [gpfsug-discuss] Problem Determination In-Reply-To: <201510011010.t91AASm6029240@d06av08.portsmouth.uk.ibm.com> References: <201510011010.t91AASm6029240@d06av08.portsmouth.uk.ibm.com> Message-ID: Hi Patrick I was going to mail you directly ? but this may help spark some discussion in this area. GPFS (pardon the use of the ?old school" term ? You need something easier to type that Spectrum Scale) problem determination is one of those areas that is (sometimes) more of an art than a science. IBM publishes a PD guide, and it?s a good start but doesn?t cover all the bases. - In the GPFS log (/var/mmfs/gen/mmfslog) there are a lot of messages generated. I continue to come across ones that are not documented ? or documented poorly. EVERYTHING that ends up in ANY log needs to be documented. - The PD guide gives some basic things to look at for many of the error messages, but doesn?t go into alternative explanation for many errors. Example: When a node gets expelled, the PD guide tells you it?s a communication issue, when it fact in may be related to other things like Linux network tuning. Covering all the possible causes is hard, but you can improve this. - GPFS waiter information ? understanding and analyzing this is key to getting to the bottom of many problems. The waiter information is not well documented. You should include at least a basic guide on how to use waiter information in determining cluster problems. Related: Undocumented config options. You can come across some by doing ?mmdiag ?config?. Using some of these can help you ? or get you in trouble in the long run. If I can see the option, document it. - Make sure that all information I might come across online is accurate, especially on those sites managed by IBM. The Developerworks wiki has great information, but there is a lot of information out there that?s out of date or inaccurate. This leads to confusion. - The automatic deadlock detection implemented in 4.1 can be useful, but it also can be problematic in a large cluster when you get into problems. Firing off traces and taking dumps in an automated manner can cause more problems if you have a large cluster. I ended up turning it off. - GPFS doesn?t have anything setup to alert you when conditions occur that may require your attention. There are some alerting capabilities that you can customize, but something out of the box might be useful. I know there is work going on in this area. mmces ? I did some early testing on this but haven?t had a chance to upgrade my protocol nodes to the new level. Upgrading 1000?s of node across many cluster is ? challenging :-) The newer commands are a great start. I like the ability to list out events related to a particular protocol. I could go on? Feel free to contact me directly for a more detailed discussion: robert.oesterlin @ nuance.com Bob Oesterlin Sr Storage Engineer, Nuance Communications From: > on behalf of Patrick Byrne Reply-To: gpfsug main discussion list Date: Thursday, October 1, 2015 at 5:09 AM To: "gpfsug-discuss at gpfsug.org" Subject: [gpfsug-discuss] Problem Determination Hi all, As I'm sure some of you aware, problem determination is an area that we are looking to try and make significant improvements to over the coming releases of Spectrum Scale. To help us target the areas we work to improve and make it as useful as possible I am trying to get as much feedback as I can about different problems users have, and how people go about solving them. I am interested in hearing everything from day to day annoyances to problems that have caused major frustration in trying to track down the root cause. Where possible it would be great to hear how the problems were dealt with as well, so that others can benefit from your experience. Feel free to reply to the mailing list - maybe others have seen similar problems and could provide tips for the future - or to me directly if you'd prefer (patbyrne at uk.ibm.com). On a related note, in 4.1.1 there was a component added that monitors the state of the various protocols that are now supported (NFS, SMB, Object). The output from this is available with the 'mmces state' and 'mmces events' CLIs and I would like to get feedback from anyone who has had the chance make use of this. Is it useful? How could it be improved? We are looking at the possibility of extending this component to cover more than just protocols, so any feedback would be greatly appreciated. Thanks in advance, Patrick Byrne IBM Spectrum Scale - Development Engineer IBM Systems - Manchester Lab IBM UK Limited -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Fri Oct 2 17:44:24 2015 From: bbanister at jumptrading.com (Bryan Banister) Date: Fri, 2 Oct 2015 16:44:24 +0000 Subject: [gpfsug-discuss] Problem Determination In-Reply-To: References: <201510011010.t91AASm6029240@d06av08.portsmouth.uk.ibm.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB05C8CE44@CHI-EXCHANGEW1.w2k.jumptrading.com> I would like to strongly echo what Bob has stated, especially the documentation or wrong documentation, and I have in-lining some comments below. I liken GPFS to a critical care patient at the hospital. You have to check on the state regularly, know the running heart rate (e.g. waiters), the response of every component from disk, to networks, to server load, etc. When a problem occurs, running tests (such as nsdperf) to help isolate the problem quickly is crucial. Capturing GPFS trace data is also very important if the problem isn?t obvious. But then you have to wait for IBM support to parse the information and give you their analysis of the situation. It would be great to get an advanced troubleshooting document that describes how to read the output of `mmfsadm dump` commands and the GPFS trace report that is generated. Cheers, -Bryan From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Oesterlin, Robert Sent: Thursday, October 01, 2015 7:39 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Problem Determination Hi Patrick I was going to mail you directly ? but this may help spark some discussion in this area. GPFS (pardon the use of the ?old school" term ? You need something easier to type that Spectrum Scale) problem determination is one of those areas that is (sometimes) more of an art than a science. IBM publishes a PD guide, and it?s a good start but doesn?t cover all the bases. - In the GPFS log (/var/mmfs/gen/mmfslog) there are a lot of messages generated. I continue to come across ones that are not documented ? or documented poorly. EVERYTHING that ends up in ANY log needs to be documented. - The PD guide gives some basic things to look at for many of the error messages, but doesn?t go into alternative explanation for many errors. Example: When a node gets expelled, the PD guide tells you it?s a communication issue, when it fact in may be related to other things like Linux network tuning. Covering all the possible causes is hard, but you can improve this. - GPFS waiter information ? understanding and analyzing this is key to getting to the bottom of many problems. The waiter information is not well documented. You should include at least a basic guide on how to use waiter information in determining cluster problems. Related: Undocumented config options. You can come across some by doing ?mmdiag ?config?. Using some of these can help you ? or get you in trouble in the long run. If I can see the option, document it. [Bryan: Also please, please provide a way to check whether or not the configuration parameters need to be changed. I assume that there is a `mmfsadm dump` command that can tell you whether the config parameter needs to be changed, if not make one! Just stating something like ?This could be increased to XX value for very large clusters? is not very helpful. - Make sure that all information I might come across online is accurate, especially on those sites managed by IBM. The Developerworks wiki has great information, but there is a lot of information out there that?s out of date or inaccurate. This leads to confusion. [Bryan: I know that Scott Fadden is a busy man, so I would recommend helping distribute the workload of maintaining the wiki documentation. This data should be reviewed on a more regular basis, at least once for each major release I would hope, and updated or deleted if found to be out of date.] - The automatic deadlock detection implemented in 4.1 can be useful, but it also can be problematic in a large cluster when you get into problems. Firing off traces and taking dumps in an automated manner can cause more problems if you have a large cluster. I ended up turning it off. [Bryan: From what I?ve heard, IBM is actively working to make the deadlock amelioration logic better. I agree that firing off traces can cause more problems, and we have turned off the automated collection as well. We are going to work on enabling the collection of some data during these events to help ensure we get enough data for IBM to analyze the problem.] - GPFS doesn?t have anything setup to alert you when conditions occur that may require your attention. There are some alerting capabilities that you can customize, but something out of the box might be useful. I know there is work going on in this area. [Bryan: The GPFS callback facilities are very useful for setting up alerts, but not well documented or advertised by the GPFS manuals. I hope to see more callback capabilities added to help monitor all aspects of the GPFS cluster and file systems] mmces ? I did some early testing on this but haven?t had a chance to upgrade my protocol nodes to the new level. Upgrading 1000?s of node across many cluster is ? challenging :-) The newer commands are a great start. I like the ability to list out events related to a particular protocol. I could go on? Feel free to contact me directly for a more detailed discussion: robert.oesterlin @ nuance.com Bob Oesterlin Sr Storage Engineer, Nuance Communications From: > on behalf of Patrick Byrne Reply-To: gpfsug main discussion list Date: Thursday, October 1, 2015 at 5:09 AM To: "gpfsug-discuss at gpfsug.org" Subject: [gpfsug-discuss] Problem Determination Hi all, As I'm sure some of you aware, problem determination is an area that we are looking to try and make significant improvements to over the coming releases of Spectrum Scale. To help us target the areas we work to improve and make it as useful as possible I am trying to get as much feedback as I can about different problems users have, and how people go about solving them. I am interested in hearing everything from day to day annoyances to problems that have caused major frustration in trying to track down the root cause. Where possible it would be great to hear how the problems were dealt with as well, so that others can benefit from your experience. Feel free to reply to the mailing list - maybe others have seen similar problems and could provide tips for the future - or to me directly if you'd prefer (patbyrne at uk.ibm.com). On a related note, in 4.1.1 there was a component added that monitors the state of the various protocols that are now supported (NFS, SMB, Object). The output from this is available with the 'mmces state' and 'mmces events' CLIs and I would like to get feedback from anyone who has had the chance make use of this. Is it useful? How could it be improved? We are looking at the possibility of extending this component to cover more than just protocols, so any feedback would be greatly appreciated. Thanks in advance, Patrick Byrne IBM Spectrum Scale - Development Engineer IBM Systems - Manchester Lab IBM UK Limited ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Oct 2 17:58:41 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 2 Oct 2015 16:58:41 +0000 Subject: [gpfsug-discuss] Problem Determination In-Reply-To: References: <201510011010.t91AASm6029240@d06av08.portsmouth.uk.ibm.com>, Message-ID: I agree on docs, particularly on mmdiag, I think things like --lroc are not documented. I'm also not sure that --network always gives accurate network stats. (we were doing some ha failure testing where we have split site in and fabrics, yet the network counters didn't change even when the local ib nsd servers were shut down). It would be nice also to have a set of Icinga/Nagios plugins from IBM, maybe in samples whcich are updated on each release with new feature checks. And not problem determination, but id really like to see an inflight non disruptive upgrade path. Particularly as we run vms off gpfs, its bot always practical or possible to move vms, so would be nice to have upgrade in flight (not suggesting this would be a quick thing to implement). Simon ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Oesterlin, Robert [Robert.Oesterlin at nuance.com] Sent: 01 October 2015 13:39 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Problem Determination Hi Patrick I was going to mail you directly ? but this may help spark some discussion in this area. GPFS (pardon the use of the ?old school" term ? You need something easier to type that Spectrum Scale) problem determination is one of those areas that is (sometimes) more of an art than a science. IBM publishes a PD guide, and it?s a good start but doesn?t cover all the bases. - In the GPFS log (/var/mmfs/gen/mmfslog) there are a lot of messages generated. I continue to come across ones that are not documented ? or documented poorly. EVERYTHING that ends up in ANY log needs to be documented. - The PD guide gives some basic things to look at for many of the error messages, but doesn?t go into alternative explanation for many errors. Example: When a node gets expelled, the PD guide tells you it?s a communication issue, when it fact in may be related to other things like Linux network tuning. Covering all the possible causes is hard, but you can improve this. - GPFS waiter information ? understanding and analyzing this is key to getting to the bottom of many problems. The waiter information is not well documented. You should include at least a basic guide on how to use waiter information in determining cluster problems. Related: Undocumented config options. You can come across some by doing ?mmdiag ?config?. Using some of these can help you ? or get you in trouble in the long run. If I can see the option, document it. - Make sure that all information I might come across online is accurate, especially on those sites managed by IBM. The Developerworks wiki has great information, but there is a lot of information out there that?s out of date or inaccurate. This leads to confusion. - The automatic deadlock detection implemented in 4.1 can be useful, but it also can be problematic in a large cluster when you get into problems. Firing off traces and taking dumps in an automated manner can cause more problems if you have a large cluster. I ended up turning it off. - GPFS doesn?t have anything setup to alert you when conditions occur that may require your attention. There are some alerting capabilities that you can customize, but something out of the box might be useful. I know there is work going on in this area. mmces ? I did some early testing on this but haven?t had a chance to upgrade my protocol nodes to the new level. Upgrading 1000?s of node across many cluster is ? challenging :-) The newer commands are a great start. I like the ability to list out events related to a particular protocol. I could go on? Feel free to contact me directly for a more detailed discussion: robert.oesterlin @ nuance.com Bob Oesterlin Sr Storage Engineer, Nuance Communications From: > on behalf of Patrick Byrne Reply-To: gpfsug main discussion list Date: Thursday, October 1, 2015 at 5:09 AM To: "gpfsug-discuss at gpfsug.org" Subject: [gpfsug-discuss] Problem Determination Hi all, As I'm sure some of you aware, problem determination is an area that we are looking to try and make significant improvements to over the coming releases of Spectrum Scale. To help us target the areas we work to improve and make it as useful as possible I am trying to get as much feedback as I can about different problems users have, and how people go about solving them. I am interested in hearing everything from day to day annoyances to problems that have caused major frustration in trying to track down the root cause. Where possible it would be great to hear how the problems were dealt with as well, so that others can benefit from your experience. Feel free to reply to the mailing list - maybe others have seen similar problems and could provide tips for the future - or to me directly if you'd prefer (patbyrne at uk.ibm.com). On a related note, in 4.1.1 there was a component added that monitors the state of the various protocols that are now supported (NFS, SMB, Object). The output from this is available with the 'mmces state' and 'mmces events' CLIs and I would like to get feedback from anyone who has had the chance make use of this. Is it useful? How could it be improved? We are looking at the possibility of extending this component to cover more than just protocols, so any feedback would be greatly appreciated. Thanks in advance, Patrick Byrne IBM Spectrum Scale - Development Engineer IBM Systems - Manchester Lab IBM UK Limited From ewahl at osc.edu Fri Oct 2 19:00:46 2015 From: ewahl at osc.edu (Wahl, Edward) Date: Fri, 2 Oct 2015 18:00:46 +0000 Subject: [gpfsug-discuss] Problem Determination In-Reply-To: <201510011010.t91AASm6029240@d06av08.portsmouth.uk.ibm.com> References: <201510011010.t91AASm6029240@d06av08.portsmouth.uk.ibm.com> Message-ID: <9DA9EC7A281AC7428A9618AFDC49049955AEB4DF@CIO-KRC-D1MBX02.osuad.osu.edu> I'm not yet in the 4.x release stream so this may be taken with a grain (or more) of salt as we say. PLEASE keep the ability of commands to set -x or dump debug when the env DEBUG=1 is set. This has been extremely useful over the years. Granted I've never worked out why sometimes we see odd little things like machines deciding they suddenly need an FPO license or one nsd server suddenly decides it's name is part of the FQDN instead of just it's hostname and only for certain commands, but it's DAMN useful. Minor issues especially can be tracked down with it. Undocumented features and logged items abound. I'd say start there. This is one area where it is definitely more art than science with Spectrum Scale (meh GPFS still sounds better. So does Shark. Can we go back to calling it the Shark Server Project?) Complete failure of the verbs layer and fallback to other defined networks would be nice to know about during operation. It's excellent about telling you at startup but not so much during operation, at least in 3.5. I imagine with the 'automated compatibility layer building' I'll be looking for some serious amounts of PD for the issues we _will_ see there. We frequently build against kernels we are not yet running at this site, so this needs well documented PD and resolution. Ed Wahl OSC ________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Patrick Byrne [PATBYRNE at uk.ibm.com] Sent: Thursday, October 01, 2015 6:09 AM To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Problem Determination Hi all, As I'm sure some of you aware, problem determination is an area that we are looking to try and make significant improvements to over the coming releases of Spectrum Scale. To help us target the areas we work to improve and make it as useful as possible I am trying to get as much feedback as I can about different problems users have, and how people go about solving them. I am interested in hearing everything from day to day annoyances to problems that have caused major frustration in trying to track down the root cause. Where possible it would be great to hear how the problems were dealt with as well, so that others can benefit from your experience. Feel free to reply to the mailing list - maybe others have seen similar problems and could provide tips for the future - or to me directly if you'd prefer (patbyrne at uk.ibm.com). On a related note, in 4.1.1 there was a component added that monitors the state of the various protocols that are now supported (NFS, SMB, Object). The output from this is available with the 'mmces state' and 'mmces events' CLIs and I would like to get feedback from anyone who has had the chance make use of this. Is it useful? How could it be improved? We are looking at the possibility of extending this component to cover more than just protocols, so any feedback would be greatly appreciated. Thanks in advance, Patrick Byrne IBM Spectrum Scale - Development Engineer IBM Systems - Manchester Lab IBM UK Limited -------------- next part -------------- An HTML attachment was scrubbed... URL: From zgiles at gmail.com Fri Oct 2 21:27:17 2015 From: zgiles at gmail.com (Zachary Giles) Date: Fri, 2 Oct 2015 16:27:17 -0400 Subject: [gpfsug-discuss] Problem Determination In-Reply-To: <9DA9EC7A281AC7428A9618AFDC49049955AEB4DF@CIO-KRC-D1MBX02.osuad.osu.edu> References: <201510011010.t91AASm6029240@d06av08.portsmouth.uk.ibm.com> <9DA9EC7A281AC7428A9618AFDC49049955AEB4DF@CIO-KRC-D1MBX02.osuad.osu.edu> Message-ID: I would like to see better performance metrics / counters from GPFS. I know we already have mmpmon, which is generally really good -- I've done some fun things with it and it has been a great tool. And, I realize that there is supposedly a new monitoring framework in 4.x.. which I haven't played with yet. But, Generally it would be extremely helpful to get synchronized (across all nodes) high accuracy counters of data flow, number of waiters, page pool stats, distribution of data from one layer to another down to NSDs.. etc etc etc. I believe many of these counters already exist, but they're hidden in some mmfsadm xx command that one needs to troll through with possible performance implications. mmpmon can do some of this, but it's only a handful of counters, it's hard to say how synchronized the counters are across nodes, and I've personally seen an mmpmon run go bad and take down a cluster. It would be nice if it were pushed out, or provided in a safe manner with the design and expectation of "log-everything forever continuously". As GSS/ESS systems start popping up, I realize they have this other monitoring framework to watch the VD throughputs.. which is great. But, that doesn't allow us to monitor more traditional types. Would be nice to monitor it all together the same way so we don't miss-out on monitoring half the infrastructure or buying a cluster with some fancy GUI that can't do what we want.. -Zach On Fri, Oct 2, 2015 at 2:00 PM, Wahl, Edward wrote: > I'm not yet in the 4.x release stream so this may be taken with a grain (or > more) of salt as we say. > > PLEASE keep the ability of commands to set -x or dump debug when the env > DEBUG=1 is set. This has been extremely useful over the years. Granted > I've never worked out why sometimes we see odd little things like machines > deciding they suddenly need an FPO license or one nsd server suddenly > decides it's name is part of the FQDN instead of just it's hostname and only > for certain commands, but it's DAMN useful. Minor issues especially can be > tracked down with it. > > Undocumented features and logged items abound. I'd say start there. This > is one area where it is definitely more art than science with Spectrum Scale > (meh GPFS still sounds better. So does Shark. Can we go back to calling it > the Shark Server Project?) > > Complete failure of the verbs layer and fallback to other defined networks > would be nice to know about during operation. It's excellent about telling > you at startup but not so much during operation, at least in 3.5. > > I imagine with the 'automated compatibility layer building' I'll be looking > for some serious amounts of PD for the issues we _will_ see there. We > frequently build against kernels we are not yet running at this site, so > this needs well documented PD and resolution. > > Ed Wahl > OSC > > > ________________________________ > From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] > on behalf of Patrick Byrne [PATBYRNE at uk.ibm.com] > Sent: Thursday, October 01, 2015 6:09 AM > To: gpfsug-discuss at gpfsug.org > Subject: [gpfsug-discuss] Problem Determination > > Hi all, > > As I'm sure some of you aware, problem determination is an area that we are > looking to try and make significant improvements to over the coming releases > of Spectrum Scale. To help us target the areas we work to improve and make > it as useful as possible I am trying to get as much feedback as I can about > different problems users have, and how people go about solving them. > > I am interested in hearing everything from day to day annoyances to problems > that have caused major frustration in trying to track down the root cause. > Where possible it would be great to hear how the problems were dealt with as > well, so that others can benefit from your experience. Feel free to reply to > the mailing list - maybe others have seen similar problems and could provide > tips for the future - or to me directly if you'd prefer > (patbyrne at uk.ibm.com). > > On a related note, in 4.1.1 there was a component added that monitors the > state of the various protocols that are now supported (NFS, SMB, Object). > The output from this is available with the 'mmces state' and 'mmces events' > CLIs and I would like to get feedback from anyone who has had the chance > make use of this. Is it useful? How could it be improved? We are looking at > the possibility of extending this component to cover more than just > protocols, so any feedback would be greatly appreciated. > > Thanks in advance, > > Patrick Byrne > IBM Spectrum Scale - Development Engineer > IBM Systems - Manchester Lab > IBM UK Limited > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Zach Giles zgiles at gmail.com From Luke.Raimbach at crick.ac.uk Mon Oct 5 13:57:14 2015 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Mon, 5 Oct 2015 12:57:14 +0000 Subject: [gpfsug-discuss] Independent Inode Space Limit Message-ID: Hi All, When creating an independent inode space, I see the valid range for the number of inodes is between 1024 and 4294967294. Is the ~4.2billion upper limit something that can be increased in the future? I also see that the first 1024 inodes are immediately allocated upon creation. I assume these are allocated to internal data structures and are a copy of a subset of the first 4038 inodes allocated for new file systems? It would be useful to know if these internal structures are fixed for independent filesets and if they are not, what factors determine their layout (for performance purposes). Many Thanks, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. From usa-principal at gpfsug.org Mon Oct 5 14:55:15 2015 From: usa-principal at gpfsug.org (usa-principal-gpfsug.org) Date: Mon, 05 Oct 2015 09:55:15 -0400 Subject: [gpfsug-discuss] Final Reminder: Inaugural US "Meet the Developers" Message-ID: <9656d0110c2be4b339ec5ce662409b8e@webmail.gpfsug.org> A last reminder to check in with Janet if you have not done so already. Looking forward to this event on Wednesday this week. Best, Kristy --- Hello Everyone, Here is a reminder about our inaugural US "Meet the Developers" session. Details are below, and please send an e-mail to Janet Ellsworth (janetell at us.ibm.com) by next Friday September 18th if you wish to attend. Janet is on the product management team for Spectrum Scale and is helping with the logistics for this first event. Date: Wednesday, October 7th Place: IBM building at 590 Madison Avenue, New York City Time: 12:30 to 5 PM (Lunch will be served at 12:30, and sessions will start between 1 and 1:30 PM. Afternoon snacks will be served as well :-) Agenda IBM development architect to present the new protocols support that was released with Spectrum Scale 4.1.1 in June. IBM developer to demo future Graphical User Interface ***Member of user community to present an experience with using Spectrum Scale (still seeking volunteers for this !)*** Open Q&A with the development team We are happy to have heard from many of you so far who would like to attend. We still have room however, so please get in touch by the 9/18 date if you would like to attend. ***We also need someone to share an experience or use case scenario with Spectrum Scale for this event, so please let Janet know if you are willing to do that too.*** As you have likely seen, we are also working on the agenda and timing for day-long GPFS US UG event in Austin during November aligned with SC15 and there will be more details on that coming soon. From secretary at gpfsug.org Wed Oct 7 12:50:51 2015 From: secretary at gpfsug.org (Secretary GPFS UG) Date: Wed, 07 Oct 2015 12:50:51 +0100 Subject: [gpfsug-discuss] Places available: Meet the Devs Message-ID: <813d82bd5074b90c3a67acc85a03995b@webmail.gpfsug.org> Hi All, There are still places available for the next 'Meet the Devs' event in Edinburgh on Friday 23rd October from 10:30/11am until 3/3:30pm. It's a great opportunity for you to meet with developers and talk through specific issues as well as learn more from the experts. Location: Room 2009a, Information Services, James Clerk Maxwell Building, Peter Guthrie Tait Road, Edinburgh EH9 3FD Google maps link: https://goo.gl/maps/Ta7DQ Agenda: - GUI - 4.2 Updates/show and tell - Open conversation on any areas of interest attendees may have Lunch and refreshments will be provided. Please email me (secretary at gpfsug.org) if you would like to attend including any particular topics of interest you would like to discuss. Best wishes, -- Claire O'Toole GPFS User Group Secretary +44 (0)7508 033896 www.gpfsug.org From service at metamodul.com Wed Oct 7 16:06:56 2015 From: service at metamodul.com (service at metamodul.com) Date: Wed, 07 Oct 2015 17:06:56 +0200 Subject: [gpfsug-discuss] Places available: Meet the Devs Message-ID: Hi Claire, I will attend the meeting. Hans-Joachim Ehlers MetaModul GmbH Germany Cheers Hajo Von Samsung Mobile gesendet
-------- Urspr?ngliche Nachricht --------
Von: Secretary GPFS UG
Datum:2015.10.07 13:50 (GMT+01:00)
An: gpfsug main discussion list
Betreff: [gpfsug-discuss] Places available: Meet the Devs
Hi All, There are still places available for the next 'Meet the Devs' event in Edinburgh on Friday 23rd October from 10:30/11am until 3/3:30pm. It's a great opportunity for you to meet with developers and talk through specific issues as well as learn more from the experts. Location: Room 2009a, Information Services, James Clerk Maxwell Building, Peter Guthrie Tait Road, Edinburgh EH9 3FD Google maps link: https://goo.gl/maps/Ta7DQ Agenda: - GUI - 4.2 Updates/show and tell - Open conversation on any areas of interest attendees may have Lunch and refreshments will be provided. Please email me (secretary at gpfsug.org) if you would like to attend including any particular topics of interest you would like to discuss. Best wishes, -- Claire O'Toole GPFS User Group Secretary +44 (0)7508 033896 www.gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Douglas.Hughes at DEShawResearch.com Wed Oct 7 19:59:26 2015 From: Douglas.Hughes at DEShawResearch.com (Hughes, Doug) Date: Wed, 7 Oct 2015 18:59:26 +0000 Subject: [gpfsug-discuss] new member, first post Message-ID: sitting here in the US GPFS UG meeting in NYC and just found out about this list. We've been a GPFS user for many years, first with integrated DDN support, but now also with a GSS system. we have about 4PB of raw GPFS storage and 1 billion inodes. We keep our metadata on TMS ramsan for very fast policy execution for tiering and migration. We use GPFS to hold the primary source data from our custom supercomputers. We have many policies executed periodically for managing the data, including writing certain files to dedicated fast pools and then migrating the data off to wide swaths of disk for read access from cluster clients. One pain point, which I'm sure many of the rest of you have seen, restripe operations for just metadata are unnecessarily slow. If we experience a flash module failure and need to restripe, it also has to check all of the data. I have a feature request open to make metadata restripes only look at metadata (since it is on RamSan/FlashCache, this should be very fast) instead of scanning everything, which can and does take months with performance impacts. Doug Hughes D. E. Shaw Research, LLC. Sent from my android device. -------------- next part -------------- An HTML attachment was scrubbed... URL: From chair at gpfsug.org Thu Oct 8 20:37:05 2015 From: chair at gpfsug.org (GPFS UG Chair (Simon Thompson)) Date: Thu, 08 Oct 2015 20:37:05 +0100 Subject: [gpfsug-discuss] User group update Message-ID: Hi, I thought I'd drop an update to the group on various admin things which have been going on behind the scenes. The first US meet the devs event was held yesterday, and I'm hoping someone who went will be preparing a blog post to cover the event a little. I know a bunch of people have joined the mailing list since then, so welcome to the group to all of those! ** User Group Engagement with IBM ** I also met with Akhtar yesterday who is the IBM VP for Technical Computing Developments (which includes Spectrum Scale). He was in the UK for a few days at the IBM Manchester Labs, so we managed to squeeze a meeting to talk a bit about the UG. I'm very pleased that Akhtar confirmed IBMs commitment to help the user group in both the UK and USA with developer support for the meet the devs and annual group meetings. I'd like to extend my thanks to those at IBM who are actively supporting the group in so many ways. One idea we have been mulling over is filming the talks at next year's events and then putting those on Youtube for people who can't get there. IBM have given us tentative agreement to do this, subject to a few conditions. Most importantly that the UG and IBM ensure we don't publish customer or IBM items which are NDA/not for general public consumption. I'm hopeful we can get this all approved and if we do, we'll be looking to the community to help us out (anyone got digital camera equipment we might be able to borrow, or some help with editing down afterwards?) Whilst in Manchester I also met with Patrick to talk over the various emails people have sent in about problem determination, which Patrick will be taking to the dev meeting in a few weeks. It sounds like there are some interesting ideas kicking about, so hopefully we'll get some value from the user group input. Some of the new features in 4.2 were also demo'd and for those who might not have been to a meet the devs session and are interested in the upcoming GUI, it is now in public beta, head over to developer works for more details: https://www.ibm.com/developerworks/community/forums/html/topic?id=4dc34bf1- 17d1-4dc0-af72-6dc5a3f93e82&ps=25 ** User Group Feedback ** Over the past few months, I've also been collecting feedback from people, either comments on the mailing list, or those who I've spoken to, which was all collated and sent in to IBM, we'll hopefully be getting some feedback on that in the next few weeks - there's a bunch of preliminary answers now, but a few places we still need a bit of clarification. There's also some longer term discussion going on about GPFS and cloud (in particular to those of us in scientific areas). We'll feed that back as and when we get responses we can share. We'd like to ensure that we gather as much feedback from users so that we can collectively take it to IBM, so please do continue to post comments etc to the mailing list. ** Diary Dates ** A few dates for diaries: * Meet the Devs in Edinburgh - Friday 23rd October 2015 * GPFS UG Meeting @ SC15 in Austin, USA - Sunday 15th November 2015 * GPFS UG Meeting @ Computing Insight UK, Coventry, UK - Tuesday 8th December 2015 (Note you must be registered also for CIUK) * GPFS UG Meeting May 2015 - IBM South Bank, London, UK- 17th/18th May 2016 ** User Group Admin ** Within the committee, we've been talking about how we can extend the reach of the group, so we may be reaching out to a few group members to take this forward. Of course if anyone has suggestions on how we can ensure we reach as many people as possible, please let me know, either via the mailing list of directly by email. I know there are lot of people on the mailing list who don't post (regularly), so I'd be interested to hear if you find the group mailing list discussion useful, if you feel there are barriers to asking questions, or what you'd like to see coming out of the user group - please feel free to email me directly if you'd like to comment on any of this! We've also registered spectrumscale.org to point to the user group, so you may start to see the group marketed as the Spectrum Scale User Group, but rest assured, its still the same old GPFS User Group ;-) Just a reminder that we made the mailing list so that only members can post. This was to reduce the amount of spam coming in and being held for moderation (and a few legit posts got lost this way). If you do want to post, but not receive the emails, you can set this as an option in the mailing list software. Finally, I've also fixed the mailing list archives, so these are now available at: http://www.gpfsug.org/pipermail/gpfsug-discuss/ Simon GPFS UG, UK Chair From L.A.Hurst at bham.ac.uk Fri Oct 9 09:25:52 2015 From: L.A.Hurst at bham.ac.uk (Laurence Alexander Hurst (IT Services)) Date: Fri, 9 Oct 2015 08:25:52 +0000 Subject: [gpfsug-discuss] User group update Message-ID: On 08/10/2015 20:37, "gpfsug-discuss-bounces at gpfsug.org on behalf of GPFS UG Chair (Simon Thompson)" wrote: >GPFS UG Meeting May 2015 - IBM South Bank, London, UK- 17th/18th May >2016 Daft question: is that 17th *and* 18th or 17th *or* 18th (presumably TBC)? Thanks, Laurence -- Laurence Hurst Research Support, IT Services, University of Birmingham From S.J.Thompson at bham.ac.uk Fri Oct 9 10:00:11 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 9 Oct 2015 09:00:11 +0000 Subject: [gpfsug-discuss] User group update In-Reply-To: References: Message-ID: Both days. May 2016 is a two day event. Simon ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Laurence Alexander Hurst (IT Services) [L.A.Hurst at bham.ac.uk] Sent: 09 October 2015 09:25 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] User group update On 08/10/2015 20:37, "gpfsug-discuss-bounces at gpfsug.org on behalf of GPFS UG Chair (Simon Thompson)" wrote: >GPFS UG Meeting May 2015 - IBM South Bank, London, UK- 17th/18th May >2016 Daft question: is that 17th *and* 18th or 17th *or* 18th (presumably TBC)? Thanks, Laurence -- Laurence Hurst Research Support, IT Services, University of Birmingham _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From S.J.Thompson at bham.ac.uk Sat Oct 10 14:54:22 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Sat, 10 Oct 2015 13:54:22 +0000 Subject: [gpfsug-discuss] User group update Message-ID: > >We've also registered spectrumscale.org to point to the user group, so you >may start to see the group marketed as the Spectrum Scale User Group, but >rest assured, its still the same old GPFS User Group ;-) And this is just a test mail to ensure that mail to gpfsug-discuss at spectrumscale.org gets through OK. The old address should also still work. Simon From S.J.Thompson at bham.ac.uk Sat Oct 10 14:55:55 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Sat, 10 Oct 2015 13:55:55 +0000 Subject: [gpfsug-discuss] User group update In-Reply-To: References: Message-ID: On 10/10/2015 14:54, "Simon Thompson (Research Computing - IT Services)" wrote: >> >>We've also registered spectrumscale.org to point to the user group, so >>you >>may start to see the group marketed as the Spectrum Scale User Group, but >>rest assured, its still the same old GPFS User Group ;-) > >And this is just a test mail to ensure that mail to >gpfsug-discuss at spectrumscale.org gets through OK. The old address should >also still work. And checking the old address still works fine as well. Simon From Robert.Oesterlin at nuance.com Tue Oct 13 03:03:45 2015 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 13 Oct 2015 02:03:45 +0000 Subject: [gpfsug-discuss] User group Meeting at SC15 - Registration Message-ID: We?d like to have all those attending the user group meeting at SC15 to register ? details are below. Thanks to IBM for getting the space and arranging all the details. I?ll post a more detailed agenda soon. Looking forward to meeting everyone! Location: JW Marriott 110 E 2nd Street Austin, Texas United States Date and Time: Sunday Nov 15, 1:00 PM?5:30 PM Agenda: - Latest IBM Spectrum Scale enhancements - Future directions and roadmap* (NDA required) - Newer usecases and User presentations Registration: Please register at the below link to book your seat. https://www-950.ibm.com/events/wwe/grp/grp017.nsf/v17_agenda?openform&seminar=99QNTNES&locale=en_US&S_TACT=sales Bob Oesterlin Sr Storage Engineer, Nuance Communications 507-269-0413 -------------- next part -------------- An HTML attachment was scrubbed... URL: From chair at spectrumscale.org Sat Oct 17 20:51:50 2015 From: chair at spectrumscale.org (GPFS UG Chair (Simon Thompson)) Date: Sat, 17 Oct 2015 20:51:50 +0100 Subject: [gpfsug-discuss] Blog on USA Meet the Devs Message-ID: Hi All, Kirsty wrote a blog post on the inaugural meet the devs in the USA. You can find it here: http://www.spectrumscale.org/inaugural-usa-meet-the-devs/ Thanks to Kristy, Bob and Pallavi for organising, the IBM devs and the group members giving talks. Simon From Tomasz.Wolski at ts.fujitsu.com Wed Oct 21 15:23:54 2015 From: Tomasz.Wolski at ts.fujitsu.com (Wolski, Tomasz) Date: Wed, 21 Oct 2015 16:23:54 +0200 Subject: [gpfsug-discuss] Intro Message-ID: Hi All, My name is Tomasz Wolski and I?m development engineer at Fujitsu Technology Solutions in Lodz, Poland. We?ve been using GPFS in our main product, which is ETERNUS CS8000, for many years now. GPFS helps us to build a consolidation of backup and archiving solutions for our end customers. We make use of GPFS snapshots, NIFS/CIFS services, GPFS API for our internal components and many many more .. :) My main responsibility, except developing new features for our system, is integration new GPFS versions into our system and bug tracking GPFS issues. Best regards, Tomasz Wolski -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Oct 23 15:04:49 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 23 Oct 2015 14:04:49 +0000 Subject: [gpfsug-discuss] Independent Inode Space Limit Message-ID: >When creating an independent inode space, I see the valid range for the >number of inodes is between 1024 and 4294967294. > >Is the ~4.2billion upper limit something that can be increased in the >future? > >I also see that the first 1024 inodes are immediately allocated upon >creation. I assume these are allocated to internal data structures and >are a copy of a subset of the first 4038 inodes allocated for new file >systems? It would be useful to know if these internal structures are >fixed for independent filesets and if they are not, what factors >determine their layout (for performance purposes). Anyone have any thoughts on this? Anyone from IBM know? Thanks Simon From sfadden at us.ibm.com Fri Oct 23 13:42:14 2015 From: sfadden at us.ibm.com (Scott Fadden) Date: Fri, 23 Oct 2015 07:42:14 -0500 Subject: [gpfsug-discuss] Independent Inode Space Limit In-Reply-To: References: Message-ID: <201510231442.t9NEgt6b026687@d03av05.boulder.ibm.com> GPFS limits the max inodes based on metadata space. Add more metadata space and you should be able to add more inodes. Scott Fadden Spectrum Scale - Technical Marketing Phone: (503) 880-5833 sfadden at us.ibm.com http://www.ibm.com/systems/storage/spectrum/scale From: "Simon Thompson (Research Computing - IT Services)" To: gpfsug main discussion list Date: 10/23/2015 09:05 AM Subject: Re: [gpfsug-discuss] Independent Inode Space Limit Sent by: gpfsug-discuss-bounces at spectrumscale.org >When creating an independent inode space, I see the valid range for the >number of inodes is between 1024 and 4294967294. > >Is the ~4.2billion upper limit something that can be increased in the >future? > >I also see that the first 1024 inodes are immediately allocated upon >creation. I assume these are allocated to internal data structures and >are a copy of a subset of the first 4038 inodes allocated for new file >systems? It would be useful to know if these internal structures are >fixed for independent filesets and if they are not, what factors >determine their layout (for performance purposes). Anyone have any thoughts on this? Anyone from IBM know? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From sfadden at us.ibm.com Fri Oct 23 13:42:14 2015 From: sfadden at us.ibm.com (Scott Fadden) Date: Fri, 23 Oct 2015 07:42:14 -0500 Subject: [gpfsug-discuss] Independent Inode Space Limit In-Reply-To: References: Message-ID: <201510231442.t9NEgQ0M024262@d01av05.pok.ibm.com> GPFS limits the max inodes based on metadata space. Add more metadata space and you should be able to add more inodes. Scott Fadden Spectrum Scale - Technical Marketing Phone: (503) 880-5833 sfadden at us.ibm.com http://www.ibm.com/systems/storage/spectrum/scale From: "Simon Thompson (Research Computing - IT Services)" To: gpfsug main discussion list Date: 10/23/2015 09:05 AM Subject: Re: [gpfsug-discuss] Independent Inode Space Limit Sent by: gpfsug-discuss-bounces at spectrumscale.org >When creating an independent inode space, I see the valid range for the >number of inodes is between 1024 and 4294967294. > >Is the ~4.2billion upper limit something that can be increased in the >future? > >I also see that the first 1024 inodes are immediately allocated upon >creation. I assume these are allocated to internal data structures and >are a copy of a subset of the first 4038 inodes allocated for new file >systems? It would be useful to know if these internal structures are >fixed for independent filesets and if they are not, what factors >determine their layout (for performance purposes). Anyone have any thoughts on this? Anyone from IBM know? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From wsawdon at us.ibm.com Fri Oct 23 16:25:33 2015 From: wsawdon at us.ibm.com (Wayne Sawdon) Date: Fri, 23 Oct 2015 08:25:33 -0700 Subject: [gpfsug-discuss] Independent Inode Space Limit In-Reply-To: <201510231442.t9NEgt6b026687@d03av05.boulder.ibm.com> References: <201510231442.t9NEgt6b026687@d03av05.boulder.ibm.com> Message-ID: <201510231525.t9NFPr1G010768@d03av04.boulder.ibm.com> >I also see that the first 1024 inodes are immediately allocated upon >creation. I assume these are allocated to internal data structures and >are a copy of a subset of the first 4038 inodes allocated for new file >systems? It would be useful to know if these internal structures are >fixed for independent filesets and if they are not, what factors >determine their layout (for performance purposes). Independent filesets don't have the internal structures that the file system has. Other than the fileset's root directory all of the remaining inodes can be allocated to user files. Inodes are always allocated in full metadata blocks. The inodes for an independent fileset are allocated in their own blocks. This makes fileset snapshots more efficient, since a copy-on-write of the block of inodes will only copy inodes in the fileset. The inode blocks for all filesets are in the same inode file, but the blocks for each independent fileset are strided, making them easy to prefetch for policy scans. -Wayne -------------- next part -------------- An HTML attachment was scrubbed... URL: From wsawdon at us.ibm.com Fri Oct 23 16:25:33 2015 From: wsawdon at us.ibm.com (Wayne Sawdon) Date: Fri, 23 Oct 2015 08:25:33 -0700 Subject: [gpfsug-discuss] Independent Inode Space Limit In-Reply-To: <201510231442.t9NEgt6b026687@d03av05.boulder.ibm.com> References: <201510231442.t9NEgt6b026687@d03av05.boulder.ibm.com> Message-ID: <201510231525.t9NFPv9P004320@d01av03.pok.ibm.com> >I also see that the first 1024 inodes are immediately allocated upon >creation. I assume these are allocated to internal data structures and >are a copy of a subset of the first 4038 inodes allocated for new file >systems? It would be useful to know if these internal structures are >fixed for independent filesets and if they are not, what factors >determine their layout (for performance purposes). Independent filesets don't have the internal structures that the file system has. Other than the fileset's root directory all of the remaining inodes can be allocated to user files. Inodes are always allocated in full metadata blocks. The inodes for an independent fileset are allocated in their own blocks. This makes fileset snapshots more efficient, since a copy-on-write of the block of inodes will only copy inodes in the fileset. The inode blocks for all filesets are in the same inode file, but the blocks for each independent fileset are strided, making them easy to prefetch for policy scans. -Wayne -------------- next part -------------- An HTML attachment was scrubbed... URL: From kallbac at iu.edu Mon Oct 26 02:38:52 2015 From: kallbac at iu.edu (Kallback-Rose, Kristy A) Date: Sun, 25 Oct 2015 22:38:52 -0400 Subject: [gpfsug-discuss] ILM and Backup Question Message-ID: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> Simon wrote recently in the GPFS UG Blog: "We also got into discussion on backup and ILM, and I think its amazing how everyone does these things in their own slightly different way. I think this might be an interesting area for discussion over on the group mailing list. There's a lot of options and different ways to do things!? Yes, please! I?m *very* interested in what others are doing. We (IU) are currently doing a POC with GHI for DR backups (GHI=GPFS HPSS Integration?we have had HPSS for a very long time), but I?m interested what others are doing with either ILM or other methods to brew their own backup solutions, how much they are backing up and with what regularity, what resources it takes, etc. If you have anything going on at your site that?s relevant, can you please share? Thanks, Kristy Kristy Kallback-Rose Manager, Research Storage Indiana University -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 495 bytes Desc: Message signed with OpenPGP using GPGMail URL: From st.graf at fz-juelich.de Mon Oct 26 08:43:33 2015 From: st.graf at fz-juelich.de (Stephan Graf) Date: Mon, 26 Oct 2015 09:43:33 +0100 Subject: [gpfsug-discuss] ILM and Backup Question In-Reply-To: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> References: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> Message-ID: <562DE7B5.7080303@fz-juelich.de> Hi! We at J?lich Supercomputing Centre have two ILM managed file systems (GPFS and HSM from TSM). #50 mio files + 10 PB data on tape #30 mio files + 8 PB data on tape For backup we use mmbackup (dsmc) for the user HOME directory (no ILM) #120 mio files => 3 hours get candidate list + x hour backup We use also mmbackup for the ILM managed filesystem. Policy: the file must be backed up first before migrated to tape 2-3 hour for candidate list + x hours/days/weeks backups (!!!) -> a metadata change (e.g. renaming a directory by the user) enforces a new backup of the files which causes a very expensive tape inline copy! Greetings from J?lich, Germany Stephan On 10/26/15 03:38, Kallback-Rose, Kristy A wrote: Simon wrote recently in the GPFS UG Blog: "We also got into discussion on backup and ILM, and I think its amazing how everyone does these things in their own slightly different way. I think this might be an interesting area for discussion over on the group mailing list. There's a lot of options and different ways to do things!? Yes, please! I?m *very* interested in what others are doing. We (IU) are currently doing a POC with GHI for DR backups (GHI=GPFS HPSS Integration?we have had HPSS for a very long time), but I?m interested what others are doing with either ILM or other methods to brew their own backup solutions, how much they are backing up and with what regularity, what resources it takes, etc. If you have anything going on at your site that?s relevant, can you please share? Thanks, Kristy Kristy Kallback-Rose Manager, Research Storage Indiana University _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ -------------- next part -------------- An HTML attachment was scrubbed... URL: From Douglas.Hughes at DEShawResearch.com Mon Oct 26 13:42:47 2015 From: Douglas.Hughes at DEShawResearch.com (Hughes, Doug) Date: Mon, 26 Oct 2015 13:42:47 +0000 Subject: [gpfsug-discuss] ILM and Backup Question In-Reply-To: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> References: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> Message-ID: We have all of our GPFSmetadata on FlashCache devices (nee Ramsan) and that helps a lot. We also have our data going into monotonically increasing buckets of about 30TB that we call lockers (e.g. locker100, locker101, locker102), with 1 primary active at a time. We have an hourly job that scans the most recent 2 lockers (taked about 45 seconds each) to generate a file list using the ILM 'LIST' policy of all files that have been modified or created in the last hour. That goes to a file that has all of the names which then trickles to a custom backup daemon that has up to 10 threads for rsyncing these over to our HSM server (running GPFS/TSM space management). From there things automatically get backed up and archived. Not all hourlies are necessarily complete (we can't guarantee that nobody is still hanging on to $lockernum-2 for instance), so we have a daily that scans the entire 3PB to find anything created/updated in the last 24 hours and does an rsync on that. There's no harm in duplication of hourlies from the rsync perspective because rsync takes care of that (already exists on destination). The daily job takes about 45 minutes. Needless to say it would be impossible without metadata on a fast flash device. Sent from my android device. -----Original Message----- From: "Kallback-Rose, Kristy A" To: gpfsug main discussion list Sent: Sun, 25 Oct 2015 22:39 Subject: [gpfsug-discuss] ILM and Backup Question Simon wrote recently in the GPFS UG Blog: "We also got into discussion on backup and ILM, and I think its amazing how everyone does these things in their own slightly different way. I think this might be an interesting area for discussion over on the group mailing list. There's a lot of options and different ways to do things!? Yes, please! I?m *very* interested in what others are doing. We (IU) are currently doing a POC with GHI for DR backups (GHI=GPFS HPSS Integration?we have had HPSS for a very long time), but I?m interested what others are doing with either ILM or other methods to brew their own backup solutions, how much they are backing up and with what regularity, what resources it takes, etc. If you have anything going on at your site that?s relevant, can you please share? Thanks, Kristy Kristy Kallback-Rose Manager, Research Storage Indiana University -------------- next part -------------- An HTML attachment was scrubbed... URL: From Douglas.Hughes at DEShawResearch.com Mon Oct 26 13:42:47 2015 From: Douglas.Hughes at DEShawResearch.com (Hughes, Doug) Date: Mon, 26 Oct 2015 13:42:47 +0000 Subject: [gpfsug-discuss] ILM and Backup Question In-Reply-To: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> References: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> Message-ID: We have all of our GPFSmetadata on FlashCache devices (nee Ramsan) and that helps a lot. We also have our data going into monotonically increasing buckets of about 30TB that we call lockers (e.g. locker100, locker101, locker102), with 1 primary active at a time. We have an hourly job that scans the most recent 2 lockers (taked about 45 seconds each) to generate a file list using the ILM 'LIST' policy of all files that have been modified or created in the last hour. That goes to a file that has all of the names which then trickles to a custom backup daemon that has up to 10 threads for rsyncing these over to our HSM server (running GPFS/TSM space management). From there things automatically get backed up and archived. Not all hourlies are necessarily complete (we can't guarantee that nobody is still hanging on to $lockernum-2 for instance), so we have a daily that scans the entire 3PB to find anything created/updated in the last 24 hours and does an rsync on that. There's no harm in duplication of hourlies from the rsync perspective because rsync takes care of that (already exists on destination). The daily job takes about 45 minutes. Needless to say it would be impossible without metadata on a fast flash device. Sent from my android device. -----Original Message----- From: "Kallback-Rose, Kristy A" To: gpfsug main discussion list Sent: Sun, 25 Oct 2015 22:39 Subject: [gpfsug-discuss] ILM and Backup Question Simon wrote recently in the GPFS UG Blog: "We also got into discussion on backup and ILM, and I think its amazing how everyone does these things in their own slightly different way. I think this might be an interesting area for discussion over on the group mailing list. There's a lot of options and different ways to do things!? Yes, please! I?m *very* interested in what others are doing. We (IU) are currently doing a POC with GHI for DR backups (GHI=GPFS HPSS Integration?we have had HPSS for a very long time), but I?m interested what others are doing with either ILM or other methods to brew their own backup solutions, how much they are backing up and with what regularity, what resources it takes, etc. If you have anything going on at your site that?s relevant, can you please share? Thanks, Kristy Kristy Kallback-Rose Manager, Research Storage Indiana University -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Mon Oct 26 20:15:26 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Mon, 26 Oct 2015 20:15:26 +0000 Subject: [gpfsug-discuss] ILM and Backup Question In-Reply-To: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> References: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> Message-ID: Hi Kristy, Yes thanks for picking this up. So we (UoB) have 3 GPFS environments, each with different approaches. 1. OpenStack (GPFS as infrastructure) - we don't back this up at all. Partly this is because we are still in pilot phase, and partly because we also have ~7PB CEPH over 4 sites for this project, and the longer term aim is for us to ensure data sets and important VM images are copied into the CEPH store (and then replicated to at least 1 other site). We have some challenges with this, how should we do this? We're sorta thinging about maybe going down the irods route for this, policy scan the FS maybe, add xattr onto important data, and use that to get irods to send copies into CEPH (somehow). So this would be a bit of a hybrid home-grown solution going on here. Anyone got suggestions about how to approach this? I know IBM are now an irods consortium member, so any magic coming from IBM to integrate GFPS and irods? 2. HPC. We differentiate on our HPC file-system between backed up and non backed up space. Mostly its non backed up, where we encourage users to keep scratch data sets. We provide a small(ish) home directory which is backed up with TSM to tape, and also backup applications and system configs of the system. We use a bunch of jobs to sync some configs into local git which also is stored in the backed up part of the FS, so things like switch configs, icinga config can be backed up sanely. 3. Research Data Storage. This is a large bulk data storage solution. So far its not actually that large (few hundred TB), so we take the traditional TSM back to tape approach (its also sync replicated between data centres). We're already starting to see some possible slowness on this with data ingest and we've only just launched the service. Maybe that is a cause of launching that we suddenly see high data ingest. We are also experimenting with HSM to tape, but other than that we have no other ILM policies - only two tiers of disk, SAS for metadata and NL-SAS for bulk data. I'd like to see a flash tier in there for Metadata, which would free SAS drives and so we might be more into ILM policies. We have some more testing with snapshots to do, and have some questions about recovery of HSM files if the FS is snapshotted. Anyone any experience with this with 4.1 upwards versions of GPFS? Straight TSM backup for us means we can end up with 6 copies of data - once per data centre, backup + offsite backup tape set, HSM pool + offsite copy of HSM pool. (If an HSM tape fails, how do we know what to restore from backup? Hence we make copies of the HSM tapes as well). As our backups run on TSM, it uses the policy engine and mmbackup, so we only backup changes and new files, and never backup twice from the FS. Does anyone know how TSM backups handle XATTRs? This is one of the questions that was raised at meet the devs. Or even other attributes like immutability, as unless you are in complaint mode, its possible for immutable files to be deleted in some cases. In fact this is an interesting topic, it just occurred to me, what happens if your HSM tape fails and it contained immutable files. Would it be possible to recover these files if you don't have a copy of the HSM tape? - can you do a synthetic recreate of the TSM HSM tape from backups? We typically tell users that backups are for DR purposes, but that we'll make efforts to try and restore files subject to resource availability. Is anyone using SOBAR? What is your rationale for this? I can see that at scale, there are lot of benefits to this. But how do you handle users corrupting/deleting files etc? My understanding of SOBAR is that it doesn't give you the same ability to recover versions of files, deletions etc that straight TSM backup does. (this is something I've been meaning to raise for a while here). So what do others do? Do you have similar approaches to not backing up some types of data/areas? Do you use TSM or home-grown solutions? Or even other commercial backup solutions? What are your rationales for making decisions on backup approaches? Has anyone built their own DMAPI type interface for doing these sorts of things? Snapshots only? Do you allow users to restore themselves? If you are using ILM, are you doing it with straight policy, or is TSM playing part of the game? (If people want to comment anonymously on this without committing their company on list, happy to take email to the chair@ address and forward on anonymously to the group). Simon On 26/10/2015, 02:38, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Kallback-Rose, Kristy A" wrote: >Simon wrote recently in the GPFS UG Blog: "We also got into discussion on >backup and ILM, and I think its amazing how everyone does these things in >their own slightly different way. I think this might be an interesting >area for discussion over on the group mailing list. There's a lot of >options and different ways to do things!? > >Yes, please! I?m *very* interested in what others are doing. > >We (IU) are currently doing a POC with GHI for DR backups (GHI=GPFS HPSS >Integration?we have had HPSS for a very long time), but I?m interested >what others are doing with either ILM or other methods to brew their own >backup solutions, how much they are backing up and with what regularity, >what resources it takes, etc. > >If you have anything going on at your site that?s relevant, can you >please share? > >Thanks, >Kristy > >Kristy Kallback-Rose >Manager, Research Storage >Indiana University From wsawdon at us.ibm.com Mon Oct 26 21:12:55 2015 From: wsawdon at us.ibm.com (Wayne Sawdon) Date: Mon, 26 Oct 2015 13:12:55 -0800 Subject: [gpfsug-discuss] ILM and Backup Question In-Reply-To: <562DE7B5.7080303@fz-juelich.de> References: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> <562DE7B5.7080303@fz-juelich.de> Message-ID: <201510262114.t9QLENpG024083@d01av01.pok.ibm.com> > From: Stephan Graf > > For backup we use mmbackup (dsmc) > for the user HOME directory (no ILM) > #120 mio files => 3 hours get candidate list + x hour backup That seems rather slow. What version of GPFS are you running? How many nodes are you using? Are you using a "-g global shared directory"? The original mmapplypolicy code was targeted to a single node, so by default it still runs on a single node and you have to specify -N to run it in parallel. When you run multi-node there is a "-g" option that defines a global shared directory that must be visible to all nodes specified in the -N list. Using "-g" with "-N" enables a scale-out parallel algorithm that substantially reduces the time for candidate selection. -Wayne -------------- next part -------------- An HTML attachment was scrubbed... URL: From wsawdon at us.ibm.com Mon Oct 26 22:22:58 2015 From: wsawdon at us.ibm.com (Wayne Sawdon) Date: Mon, 26 Oct 2015 14:22:58 -0800 Subject: [gpfsug-discuss] ILM and Backup Question In-Reply-To: References: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> Message-ID: <201510262224.t9QMOwRO006986@d03av03.boulder.ibm.com> > From: "Simon Thompson (Research Computing - IT Services)" > > Does anyone know how TSM backups handle XATTRs? TSM capture XATTRs and ACLs in an opaque "blob" using gpfs_fgetattrs. Unfortunately, TSM stores the opaque blob with the file data. Changes to the blob require the data to be backed up again. > Or even other attributes like immutability, Immutable files may be backed up and restored as immutable files. Immutability is restored after the data has been restored. > can you do a synthetic recreate of the TSM HSM tape from backups? TSM stores data from backups and data from HSM in different pools. A file that is both HSM'ed and backed up will have at least two copies of data off-line. I suspect that losing a tape from the HSM pool will have no effect on the backup pool, but you should verify that with someone from TSM. -Wayne -------------- next part -------------- An HTML attachment was scrubbed... URL: From st.graf at fz-juelich.de Tue Oct 27 07:03:19 2015 From: st.graf at fz-juelich.de (Stephan Graf) Date: Tue, 27 Oct 2015 08:03:19 +0100 Subject: [gpfsug-discuss] ILM and Backup Question In-Reply-To: <201510262114.t9QLENpG024083@d01av01.pok.ibm.com> References: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> <562DE7B5.7080303@fz-juelich.de> <201510262114.t9QLENpG024083@d01av01.pok.ibm.com> Message-ID: <562F21B7.8040007@fz-juelich.de> We are running the mmbackup on an AIX system oslevel -s 6100-07-10-1415 Current GPFS build: "4.1.0.8 ". So we only use one node for the policy run. Stephan On 10/26/15 22:12, Wayne Sawdon wrote: > From: Stephan Graf > > For backup we use mmbackup (dsmc) > for the user HOME directory (no ILM) > #120 mio files => 3 hours get candidate list + x hour backup That seems rather slow. What version of GPFS are you running? How many nodes are you using? Are you using a "-g global shared directory"? The original mmapplypolicy code was targeted to a single node, so by default it still runs on a single node and you have to specify -N to run it in parallel. When you run multi-node there is a "-g" option that defines a global shared directory that must be visible to all nodes specified in the -N list. Using "-g" with "-N" enables a scale-out parallel algorithm that substantially reduces the time for candidate selection. -Wayne _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ -------------- next part -------------- An HTML attachment was scrubbed... URL: From kraemerf at de.ibm.com Tue Oct 27 09:02:52 2015 From: kraemerf at de.ibm.com (Frank Kraemer) Date: Tue, 27 Oct 2015 10:02:52 +0100 Subject: [gpfsug-discuss] Spectrum Scale v4.2 In-Reply-To: References: Message-ID: <201510270904.t9R940k4019623@d06av11.portsmouth.uk.ibm.com> see "IBM Spectrum Scale V4.2 delivers simple, efficient,and intelligent data management for highperformance,scale-out storage" http://www.ibm.com/common/ssi/rep_ca/8/897/ENUS215-398/ENUS215-398.PDF Frank Kraemer IBM Consulting IT Specialist / Client Technical Architect Hechtsheimer Str. 2, 55131 Mainz mailto:kraemerf at de.ibm.com voice: +49171-3043699 IBM Germany -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Tue Oct 27 10:47:43 2015 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Tue, 27 Oct 2015 10:47:43 +0000 Subject: [gpfsug-discuss] ILM and Backup Question In-Reply-To: <201510262224.t9QMOwRO006986@d03av03.boulder.ibm.com> References: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> <201510262224.t9QMOwRO006986@d03av03.boulder.ibm.com> Message-ID: <1445942863.17909.89.camel@buzzard.phy.strath.ac.uk> On Mon, 2015-10-26 at 14:22 -0800, Wayne Sawdon wrote: [SNIP] > > > > can you do a synthetic recreate of the TSM HSM tape from backups? > > TSM stores data from backups and data from HSM in different pools. A > file that is both HSM'ed and backed up will have at least two copies > of data off-line. I suspect that losing a tape from the HSM pool will > have no effect on the backup pool, but you should verify that with > someone from TSM. > I am pretty sure that you have to restore the files first from backup, and it is a manual process. Least it was for me when a HSM tape went bad in the past. Had to use TSM to generate a list of the files on the HSM tape, and then feed that in to a dsmc restore, before doing a reconcile and removing the tape from the library for destruction. Finally all the files where punted back to tape. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From wsawdon at us.ibm.com Tue Oct 27 15:25:02 2015 From: wsawdon at us.ibm.com (Wayne Sawdon) Date: Tue, 27 Oct 2015 07:25:02 -0800 Subject: [gpfsug-discuss] ILM and Backup Question In-Reply-To: <562F21B7.8040007@fz-juelich.de> References: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> <562DE7B5.7080303@fz-juelich.de> <201510262114.t9QLENpG024083@d01av01.pok.ibm.com> <562F21B7.8040007@fz-juelich.de> Message-ID: <201510271526.t9RFQ2Bw027971@d03av02.boulder.ibm.com> > From: Stephan Graf > We are running the mmbackup on an AIX system > oslevel -s > 6100-07-10-1415 > Current GPFS build: "4.1.0.8 ". > > So we only use one node for the policy run. > Even on one node you should see a speedup using -g and -N. -Wayne -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Oct 27 17:28:00 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Tue, 27 Oct 2015 17:28:00 +0000 Subject: [gpfsug-discuss] Quotas, replication and hsm Message-ID: Hi, If we have replication enabled on a file, does the size from ls -l or du return the actual file size, or the replicated file size (I.e. Twice the actual size)?. >From experimentation, it appears to be double the actual size, I.e. Taking into account replication of 2. This appears to mean that quotas have to be double what we actually want to take account of the replication factor. Is this correct? Second part of the question. If a file is transferred to tape (or compressed maybe as well), does the file still count against quota, and how much for? As on hsm tape its no longer copies=2. Same for a compressed file, does the compressed file count as the original or compressed size against quota? I.e. Could a user accessing a compressed file suddenly go over quota by accessing the file? Thanks Simon From Robert.Oesterlin at nuance.com Tue Oct 27 19:48:04 2015 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 27 Oct 2015 19:48:04 +0000 Subject: [gpfsug-discuss] Spectrum Scale 4.2 - upgrade and ongoing PTFs Message-ID: <4E539EE4-596B-441C-9E60-46072E567765@nuance.com> With Spectrum Scale 4.2 announced, can anyone from IBM comment on what the outlook/process is for fixes and PTFs? When 4.1.1 came out, 4.1.0.X more or less dies, with 4.1.0.8 being the last level ? yes? Then move to 4.1.1 With 4.1.1 ? we are now at 4.1.1-2 and 4.2 is going to GA on 11/20/2015 Is the plan to ?encourage? the upgrade to 4.2, meaning if you want fixes and are at 4.1.1-x, you move to 4.2, or will IBM continue to PTF the 4.1.1 stream for the foreseeable future? Bob Oesterlin Sr Storage Engineer, Nuance Communications 507-269-0413 -------------- next part -------------- An HTML attachment was scrubbed... URL: From st.graf at fz-juelich.de Wed Oct 28 08:06:01 2015 From: st.graf at fz-juelich.de (Stephan Graf) Date: Wed, 28 Oct 2015 09:06:01 +0100 Subject: [gpfsug-discuss] ILM and Backup Question In-Reply-To: <201510271526.t9RFQ2Bw027971@d03av02.boulder.ibm.com> References: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> <562DE7B5.7080303@fz-juelich.de> <201510262114.t9QLENpG024083@d01av01.pok.ibm.com> <562F21B7.8040007@fz-juelich.de> <201510271526.t9RFQ2Bw027971@d03av02.boulder.ibm.com> Message-ID: <563081E9.2090605@fz-juelich.de> Hi Wayne! We are using -g, and we only want to run it on one node, so we don't use the -N option. Stephan On 10/27/15 16:25, Wayne Sawdon wrote: > From: Stephan Graf > We are running the mmbackup on an AIX system > oslevel -s > 6100-07-10-1415 > Current GPFS build: "4.1.0.8 ". > > So we only use one node for the policy run. > Even on one node you should see a speedup using -g and -N. -Wayne _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ -------------- next part -------------- An HTML attachment was scrubbed... URL: From Dan.Foster at bristol.ac.uk Wed Oct 28 10:06:10 2015 From: Dan.Foster at bristol.ac.uk (Dan Foster) Date: Wed, 28 Oct 2015 10:06:10 +0000 Subject: [gpfsug-discuss] Quotas, replication and hsm In-Reply-To: References: Message-ID: On 27 October 2015 at 17:28, Simon Thompson (Research Computing - IT Services) wrote: > Hi, > > If we have replication enabled on a file, does the size from ls -l or du return the actual file size, or the replicated file size (I.e. Twice the actual size)?. > > From experimentation, it appears to be double the actual size, I.e. Taking into account replication of 2. > > This appears to mean that quotas have to be double what we actually want to take account of the replication factor. > > Is this correct? This is what we obverse here by default and currently have to double our fileset quotas to take this is to account on replicated filesystems. You've reminded me that I was going to ask this list if it's possible to report the un-replicated sizes? While the quota management is only a slight pain, what's reported to the user is more of a problem for us(e.g. via SMB share / df ). We're considering replicating a lot more of our filesystems and it would be useful if it didn't appear that everyones quotas had just doubled overnight. Thanks, Dan. -- Dan Foster | Senior Storage Systems Administrator | IT Services From duersch at us.ibm.com Wed Oct 28 12:47:52 2015 From: duersch at us.ibm.com (Steve Duersch) Date: Wed, 28 Oct 2015 08:47:52 -0400 Subject: [gpfsug-discuss] Spectrum Scale 4.2 - upgrade and ongoing PTFs Message-ID: >>Is the plan to ?encourage? the upgrade to 4.2, meaning if you want fixes and are at 4.1.1-x, you move to 4.2, or will IBM continue to PTF the 4.1.1 stream for the foreseeable future? IBM will continue to create PTFs for the 4.1.1 stream. Steve Duersch Spectrum Scale (GPFS) FVTest IBM Poughkeepsie, New York -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Oct 28 13:06:56 2015 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 28 Oct 2015 13:06:56 +0000 Subject: [gpfsug-discuss] Spectrum Scale 4.2 - upgrade and ongoing PTFs In-Reply-To: References: Message-ID: Hi Steve Thanks ? that?s puzzling (surprising?) given that 4.1.1 hasn?t really been out that long. (less than 6 months) I?m in a position of deciding of what my upgrade path and timeline should be. If I?m at 4.1.0.X and want to upgrade all my clusters, the ?safer? bet is probably 4.1.1-X. but all the new features are going to end up on the 4.2.X. If 4.2 is going to GA in November, perhaps it?s better to wait for the first 4.2 PTF package. Bob Oesterlin Sr Storage Engineer, Nuance Communications 507-269-0413 From: > on behalf of Steve Duersch > Reply-To: gpfsug main discussion list > Date: Wednesday, October 28, 2015 at 7:47 AM To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] Spectrum Scale 4.2 - upgrade and ongoing PTFs IBM will continue to create PTFs for the 4.1.1 stream. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Wed Oct 28 13:09:52 2015 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 28 Oct 2015 13:09:52 +0000 Subject: [gpfsug-discuss] Spectrum Scale 4.2 - upgrade and ongoing PTFs In-Reply-To: References: Message-ID: <6AB4198E-DE7C-4F5D-9C3A-0067C85D1AE0@vanderbilt.edu> All, What about the 4.1.0-x stream? We?re on 4.1.0-8 and will soon be applying an efix to it to take care of the snapshot deletion and ?quotas are wrong? bugs. We?ve also go no immediate plans to go to either 4.1.1-x or 4.2 until they?ve had a chance to ? mature. It?s not that big of a deal - I don?t mind running on the efix for a while. Just curious. Thanks? Kevin On Oct 28, 2015, at 7:47 AM, Steve Duersch > wrote: >>Is the plan to ?encourage? the upgrade to 4.2, meaning if you want fixes and are at 4.1.1-x, you move to 4.2, or will IBM continue to PTF the 4.1.1 stream for the foreseeable future? IBM will continue to create PTFs for the 4.1.1 stream. Steve Duersch Spectrum Scale (GPFS) FVTest IBM Poughkeepsie, New York _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Wed Oct 28 13:15:30 2015 From: bbanister at jumptrading.com (Bryan Banister) Date: Wed, 28 Oct 2015 13:15:30 +0000 Subject: [gpfsug-discuss] Spectrum Scale 4.2 - upgrade and ongoing PTFs In-Reply-To: <6AB4198E-DE7C-4F5D-9C3A-0067C85D1AE0@vanderbilt.edu> References: <6AB4198E-DE7C-4F5D-9C3A-0067C85D1AE0@vanderbilt.edu> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB05CF6CF0@CHI-EXCHANGEW1.w2k.jumptrading.com> IBM has stated that there will no longer be PTF releases for 4.1.0, and that 4.1.0-8 is the last PTF release. Thus you?ll have to choose between upgrading to 4.1.1 (which has the latest GPFS Protocols feature, hence the numbering change), or wait and go with the 4.2 release. I heard rumor from somebody at IBM (honestly can?t remember who) that the first 3 releases of any major release has some additional debugging turned up, which is turned off after on the fourth PTF release and those going forward. Does anybody at IBM want to confirm or deny this rumor? I?m also leery of going with the first major release of GPFS (or any software, like RHEL 7.0 for instance). Thanks, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: Wednesday, October 28, 2015 8:10 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Spectrum Scale 4.2 - upgrade and ongoing PTFs All, What about the 4.1.0-x stream? We?re on 4.1.0-8 and will soon be applying an efix to it to take care of the snapshot deletion and ?quotas are wrong? bugs. We?ve also go no immediate plans to go to either 4.1.1-x or 4.2 until they?ve had a chance to ? mature. It?s not that big of a deal - I don?t mind running on the efix for a while. Just curious. Thanks? Kevin On Oct 28, 2015, at 7:47 AM, Steve Duersch > wrote: >>Is the plan to ?encourage? the upgrade to 4.2, meaning if you want fixes and are at 4.1.1-x, you move to 4.2, or will IBM continue to PTF the 4.1.1 stream for the foreseeable future? IBM will continue to create PTFs for the 4.1.1 stream. Steve Duersch Spectrum Scale (GPFS) FVTest IBM Poughkeepsie, New York _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Wed Oct 28 13:25:27 2015 From: bbanister at jumptrading.com (Bryan Banister) Date: Wed, 28 Oct 2015 13:25:27 +0000 Subject: [gpfsug-discuss] Quotas, replication and hsm In-Reply-To: References: Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB05CF6E05@CHI-EXCHANGEW1.w2k.jumptrading.com> I'm not sure what kind of report you're looking for, but the `du` command has a "--apparent-size" option that has this description: print apparent sizes, rather than disk usage; although the apparent size is usually smaller, it may be larger due to holes in (?sparse?) files, internal fragmentation, indirect blocks, and the like This can be used to get the actual amount of space that files are using. I think that mmrepquota and mmlsquota show twice the amount of space of the actual file due to the replication, but somebody correct me if I'm mistaken. I also would like to know what the output of the ILM "LIST" policy reports for KB_ALLOCATED for replicated files. Is it the replicated amount of data? Thanks, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Dan Foster Sent: Wednesday, October 28, 2015 5:06 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Quotas, replication and hsm On 27 October 2015 at 17:28, Simon Thompson (Research Computing - IT Services) wrote: > Hi, > > If we have replication enabled on a file, does the size from ls -l or du return the actual file size, or the replicated file size (I.e. Twice the actual size)?. > > From experimentation, it appears to be double the actual size, I.e. Taking into account replication of 2. > > This appears to mean that quotas have to be double what we actually want to take account of the replication factor. > > Is this correct? This is what we obverse here by default and currently have to double our fileset quotas to take this is to account on replicated filesystems. You've reminded me that I was going to ask this list if it's possible to report the un-replicated sizes? While the quota management is only a slight pain, what's reported to the user is more of a problem for us(e.g. via SMB share / df ). We're considering replicating a lot more of our filesystems and it would be useful if it didn't appear that everyones quotas had just doubled overnight. Thanks, Dan. -- Dan Foster | Senior Storage Systems Administrator | IT Services _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. From wsawdon at us.ibm.com Wed Oct 28 13:36:27 2015 From: wsawdon at us.ibm.com (Wayne Sawdon) Date: Wed, 28 Oct 2015 05:36:27 -0800 Subject: [gpfsug-discuss] ILM and Backup Question In-Reply-To: <563081E9.2090605@fz-juelich.de> References: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> <562DE7B5.7080303@fz-juelich.de> <201510262114.t9QLENpG024083@d01av01.pok.ibm.com> <562F21B7.8040007@fz-juelich.de> <201510271526.t9RFQ2Bw027971@d03av02.boulder.ibm.com> <563081E9.2090605@fz-juelich.de> Message-ID: <201510281336.t9SDaiNa015723@d01av01.pok.ibm.com> You have to use both options even if -N is only the local node. Sorry, -Wayne From: Stephan Graf To: Date: 10/28/2015 01:06 AM Subject: Re: [gpfsug-discuss] ILM and Backup Question Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Wayne! We are using -g, and we only want to run it on one node, so we don't use the -N option. Stephan On 10/27/15 16:25, Wayne Sawdon wrote: > From: Stephan Graf > We are running the mmbackup on an AIX system > oslevel -s > 6100-07-10-1415 > Current GPFS build: "4.1.0.8 ". > > So we only use one node for the policy run. > Even on one node you should see a speedup using -g and -N. -Wayne _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From wsawdon at us.ibm.com Wed Oct 28 14:11:25 2015 From: wsawdon at us.ibm.com (Wayne Sawdon) Date: Wed, 28 Oct 2015 06:11:25 -0800 Subject: [gpfsug-discuss] Quotas, replication and hsm In-Reply-To: References: Message-ID: <201510281412.t9SEChQo030691@d01av03.pok.ibm.com> > From: "Simon Thompson (Research Computing - IT Services)" > > > Second part of the question. If a file is transferred to tape (or > compressed maybe as well), does the file still count against quota, > and how much for? As on hsm tape its no longer copies=2. Same for a > compressed file, does the compressed file count as the original or > compressed size against quota? I.e. Could a user accessing a > compressed file suddenly go over quota by accessing the file? > Quotas account for space in the file system. If you migrate a user's file to tape, then that user is credited for the space saved. If a later access recalls the file then the user is again charged for the space. Note that HSM recall is done as "root" which bypasses the quota check -- this allows the file to be recalled even if it pushes the user past his quota limit. Compression (which is currently in beta) has the same properties. If you compress a file, then the user is credited with the space saved. When the file is uncompressed the user is again charged. Since uncompression is done by the "user" the quota check is enforced and uncompression can fail. This includes writes to a compressed file. > From: Bryan Banister > > I also would like to know what the output of the ILM "LIST" policy > reports for KB_ALLOCATED for replicated files. Is it the replicated > amount of data? > KB_ALLOCATED shows the same value that stat shows, So yes it shows the replicated amount of data actually used by the file. -Wayne -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Wed Oct 28 14:48:11 2015 From: makaplan at us.ibm.com (makaplan at us.ibm.com) Date: Wed, 28 Oct 2015 09:48:11 -0500 Subject: [gpfsug-discuss] ILM and Backup Question In-Reply-To: <201510281336.t9SDaiNa015723@d01av01.pok.ibm.com> References: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> <562DE7B5.7080303@fz-juelich.de> <201510262114.t9QLENpG024083@d01av01.pok.ibm.com> <562F21B7.8040007@fz-juelich.de> <201510271526.t9RFQ2Bw027971@d03av02.boulder.ibm.com> <563081E9.2090605@fz-juelich.de> <201510281336.t9SDaiNa015723@d01av01.pok.ibm.com> Message-ID: <201510281448.t9SEmFsr030044@d01av02.pok.ibm.com> IF you see one or more status messages like this: [I] %2$s Parallel-piped sort and policy evaluation. %1$llu files scanned. %3$s Then you are getting the (potentially) fastest version of the GPFS inode and policy scanning algorithm. You may also want to adjust the -a and -A options of the mmapplypolicy command, as mentioned in the command documentation. Oh I see the documentation for -A is wrong in many versions of the manual. There is an attempt to automagically estimate the proper number of buckets, based on the inodes allocated count. If you want to investigate performance more I recommend you use our debug option -d 7 or set environment variable MM_POLICY_DEBUG_BITS=7 - this will show you how the work is divided among the nodes and threads. --marc of GPFS -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Thu Oct 29 14:14:58 2015 From: knop at us.ibm.com (Felipe Knop) Date: Thu, 29 Oct 2015 09:14:58 -0500 Subject: [gpfsug-discuss] Intro (new member) Message-ID: Hi, I have just joined the GPFS (Spectrum Scale) UG list. I work in the GPFS development team. I had the chance of attending the "Inaugural USA Meet the Devs" session in New York City on Oct 7, which was a valuable opportunity to hear from customers using the product. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carlz at us.ibm.com Fri Oct 30 15:14:50 2015 From: carlz at us.ibm.com (Carl Zetie) Date: Fri, 30 Oct 2015 10:14:50 -0500 Subject: [gpfsug-discuss] Making an RFE Public (and an intro) Message-ID: <201510301520.t9UFKTUP032501@d01av04.pok.ibm.com> First the intro: I am the new Product Manager joining the Spectrum Scale team, taking the place of Janet Ellsworth. I'm looking forward to meeting with you all. I also have some news about RFEs: we are working to enable you to choose whether your RFEs for Scale are private or public. I know that many of you have requested public RFEs so that other people can see and vote on RFEs. We'd like to see that too as it's very valuable information for us (as well as reducing duplicates). So here's what we're doing: Short term: If you have an existing RFE that you would like to see made Public, please email me with the ID of the RFE. You can find my email address at the foot of this message. PLEASE don't email the entire list! Medium term: We are working to allow you to choose at the time of submission whether a request will be Private or Public. Unfortunately for technical internal reasons we can't simply make the Public / Private field selectable at submission time (don't ask!), so instead we are creating two submission queues, one for Private RFEs and another for public RFEs. So when you submit an RFE in future you'll start by selecting the appropriate queue. Inside IBM, they all go into the same evaluation process. As soon as I have an update on the availability of this fix, I will share with the group. Note that even for Public requests, some fields remain Private and hidden from other viewers, e.g. Business Case (look for the "key" icon next to the field to confirm). regards, Carl Carl Zetie Product Manager for Spectrum Scale, IBM (540) 882 9353 ][ 15750 Brookhill Ct, Waterford VA 20197 carlz at us.ibm.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfhamano at us.ibm.com Fri Oct 30 15:29:58 2015 From: jfhamano at us.ibm.com (John Hamano) Date: Fri, 30 Oct 2015 07:29:58 -0800 Subject: [gpfsug-discuss] Making an RFE Public (and an intro) In-Reply-To: <201510301520.t9UFKTUP032501@d01av04.pok.ibm.com> References: <201510301520.t9UFKTUP032501@d01av04.pok.ibm.com> Message-ID: <201510301530.t9UFUM0M004729@d03av05.boulder.ibm.com> Hi Carl, welcome and congratulations on your new role. I am North America Brand Sales for ESS and Spectrum Scale. Let me know when you have some time next weekg to talk. From: Carl Zetie/Fairfax/IBM at IBMUS To: gpfsug-discuss at spectrumscale.org, Date: 10/30/2015 08:20 AM Subject: [gpfsug-discuss] Making an RFE Public (and an intro) Sent by: gpfsug-discuss-bounces at spectrumscale.org First the intro: I am the new Product Manager joining the Spectrum Scale team, taking the place of Janet Ellsworth. I'm looking forward to meeting with you all. I also have some news about RFEs: we are working to enable you to choose whether your RFEs for Scale are private or public. I know that many of you have requested public RFEs so that other people can see and vote on RFEs. We'd like to see that too as it's very valuable information for us (as well as reducing duplicates). So here's what we're doing: Short term: If you have an existing RFE that you would like to see made Public, please email me with the ID of the RFE. You can find my email address at the foot of this message. PLEASE don't email the entire list! Medium term: We are working to allow you to choose at the time of submission whether a request will be Private or Public. Unfortunately for technical internal reasons we can't simply make the Public / Private field selectable at submission time (don't ask!), so instead we are creating two submission queues, one for Private RFEs and another for public RFEs. So when you submit an RFE in future you'll start by selecting the appropriate queue. Inside IBM, they all go into the same evaluation process. As soon as I have an update on the availability of this fix, I will share with the group. Note that even for Public requests, some fields remain Private and hidden from other viewers, e.g. Business Case (look for the "key" icon next to the field to confirm). regards, Carl Carl Zetie Product Manager for Spectrum Scale, IBM (540) 882 9353 ][ 15750 Brookhill Ct, Waterford VA 20197 carlz at us.ibm.com_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From PATBYRNE at uk.ibm.com Thu Oct 1 11:09:29 2015 From: PATBYRNE at uk.ibm.com (Patrick Byrne) Date: Thu, 1 Oct 2015 10:09:29 +0000 Subject: [gpfsug-discuss] Problem Determination Message-ID: <201510011010.t91AASm6029240@d06av08.portsmouth.uk.ibm.com> An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu Oct 1 13:39:25 2015 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 1 Oct 2015 12:39:25 +0000 Subject: [gpfsug-discuss] Problem Determination In-Reply-To: <201510011010.t91AASm6029240@d06av08.portsmouth.uk.ibm.com> References: <201510011010.t91AASm6029240@d06av08.portsmouth.uk.ibm.com> Message-ID: Hi Patrick I was going to mail you directly ? but this may help spark some discussion in this area. GPFS (pardon the use of the ?old school" term ? You need something easier to type that Spectrum Scale) problem determination is one of those areas that is (sometimes) more of an art than a science. IBM publishes a PD guide, and it?s a good start but doesn?t cover all the bases. - In the GPFS log (/var/mmfs/gen/mmfslog) there are a lot of messages generated. I continue to come across ones that are not documented ? or documented poorly. EVERYTHING that ends up in ANY log needs to be documented. - The PD guide gives some basic things to look at for many of the error messages, but doesn?t go into alternative explanation for many errors. Example: When a node gets expelled, the PD guide tells you it?s a communication issue, when it fact in may be related to other things like Linux network tuning. Covering all the possible causes is hard, but you can improve this. - GPFS waiter information ? understanding and analyzing this is key to getting to the bottom of many problems. The waiter information is not well documented. You should include at least a basic guide on how to use waiter information in determining cluster problems. Related: Undocumented config options. You can come across some by doing ?mmdiag ?config?. Using some of these can help you ? or get you in trouble in the long run. If I can see the option, document it. - Make sure that all information I might come across online is accurate, especially on those sites managed by IBM. The Developerworks wiki has great information, but there is a lot of information out there that?s out of date or inaccurate. This leads to confusion. - The automatic deadlock detection implemented in 4.1 can be useful, but it also can be problematic in a large cluster when you get into problems. Firing off traces and taking dumps in an automated manner can cause more problems if you have a large cluster. I ended up turning it off. - GPFS doesn?t have anything setup to alert you when conditions occur that may require your attention. There are some alerting capabilities that you can customize, but something out of the box might be useful. I know there is work going on in this area. mmces ? I did some early testing on this but haven?t had a chance to upgrade my protocol nodes to the new level. Upgrading 1000?s of node across many cluster is ? challenging :-) The newer commands are a great start. I like the ability to list out events related to a particular protocol. I could go on? Feel free to contact me directly for a more detailed discussion: robert.oesterlin @ nuance.com Bob Oesterlin Sr Storage Engineer, Nuance Communications From: > on behalf of Patrick Byrne Reply-To: gpfsug main discussion list Date: Thursday, October 1, 2015 at 5:09 AM To: "gpfsug-discuss at gpfsug.org" Subject: [gpfsug-discuss] Problem Determination Hi all, As I'm sure some of you aware, problem determination is an area that we are looking to try and make significant improvements to over the coming releases of Spectrum Scale. To help us target the areas we work to improve and make it as useful as possible I am trying to get as much feedback as I can about different problems users have, and how people go about solving them. I am interested in hearing everything from day to day annoyances to problems that have caused major frustration in trying to track down the root cause. Where possible it would be great to hear how the problems were dealt with as well, so that others can benefit from your experience. Feel free to reply to the mailing list - maybe others have seen similar problems and could provide tips for the future - or to me directly if you'd prefer (patbyrne at uk.ibm.com). On a related note, in 4.1.1 there was a component added that monitors the state of the various protocols that are now supported (NFS, SMB, Object). The output from this is available with the 'mmces state' and 'mmces events' CLIs and I would like to get feedback from anyone who has had the chance make use of this. Is it useful? How could it be improved? We are looking at the possibility of extending this component to cover more than just protocols, so any feedback would be greatly appreciated. Thanks in advance, Patrick Byrne IBM Spectrum Scale - Development Engineer IBM Systems - Manchester Lab IBM UK Limited -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Fri Oct 2 17:44:24 2015 From: bbanister at jumptrading.com (Bryan Banister) Date: Fri, 2 Oct 2015 16:44:24 +0000 Subject: [gpfsug-discuss] Problem Determination In-Reply-To: References: <201510011010.t91AASm6029240@d06av08.portsmouth.uk.ibm.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB05C8CE44@CHI-EXCHANGEW1.w2k.jumptrading.com> I would like to strongly echo what Bob has stated, especially the documentation or wrong documentation, and I have in-lining some comments below. I liken GPFS to a critical care patient at the hospital. You have to check on the state regularly, know the running heart rate (e.g. waiters), the response of every component from disk, to networks, to server load, etc. When a problem occurs, running tests (such as nsdperf) to help isolate the problem quickly is crucial. Capturing GPFS trace data is also very important if the problem isn?t obvious. But then you have to wait for IBM support to parse the information and give you their analysis of the situation. It would be great to get an advanced troubleshooting document that describes how to read the output of `mmfsadm dump` commands and the GPFS trace report that is generated. Cheers, -Bryan From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Oesterlin, Robert Sent: Thursday, October 01, 2015 7:39 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Problem Determination Hi Patrick I was going to mail you directly ? but this may help spark some discussion in this area. GPFS (pardon the use of the ?old school" term ? You need something easier to type that Spectrum Scale) problem determination is one of those areas that is (sometimes) more of an art than a science. IBM publishes a PD guide, and it?s a good start but doesn?t cover all the bases. - In the GPFS log (/var/mmfs/gen/mmfslog) there are a lot of messages generated. I continue to come across ones that are not documented ? or documented poorly. EVERYTHING that ends up in ANY log needs to be documented. - The PD guide gives some basic things to look at for many of the error messages, but doesn?t go into alternative explanation for many errors. Example: When a node gets expelled, the PD guide tells you it?s a communication issue, when it fact in may be related to other things like Linux network tuning. Covering all the possible causes is hard, but you can improve this. - GPFS waiter information ? understanding and analyzing this is key to getting to the bottom of many problems. The waiter information is not well documented. You should include at least a basic guide on how to use waiter information in determining cluster problems. Related: Undocumented config options. You can come across some by doing ?mmdiag ?config?. Using some of these can help you ? or get you in trouble in the long run. If I can see the option, document it. [Bryan: Also please, please provide a way to check whether or not the configuration parameters need to be changed. I assume that there is a `mmfsadm dump` command that can tell you whether the config parameter needs to be changed, if not make one! Just stating something like ?This could be increased to XX value for very large clusters? is not very helpful. - Make sure that all information I might come across online is accurate, especially on those sites managed by IBM. The Developerworks wiki has great information, but there is a lot of information out there that?s out of date or inaccurate. This leads to confusion. [Bryan: I know that Scott Fadden is a busy man, so I would recommend helping distribute the workload of maintaining the wiki documentation. This data should be reviewed on a more regular basis, at least once for each major release I would hope, and updated or deleted if found to be out of date.] - The automatic deadlock detection implemented in 4.1 can be useful, but it also can be problematic in a large cluster when you get into problems. Firing off traces and taking dumps in an automated manner can cause more problems if you have a large cluster. I ended up turning it off. [Bryan: From what I?ve heard, IBM is actively working to make the deadlock amelioration logic better. I agree that firing off traces can cause more problems, and we have turned off the automated collection as well. We are going to work on enabling the collection of some data during these events to help ensure we get enough data for IBM to analyze the problem.] - GPFS doesn?t have anything setup to alert you when conditions occur that may require your attention. There are some alerting capabilities that you can customize, but something out of the box might be useful. I know there is work going on in this area. [Bryan: The GPFS callback facilities are very useful for setting up alerts, but not well documented or advertised by the GPFS manuals. I hope to see more callback capabilities added to help monitor all aspects of the GPFS cluster and file systems] mmces ? I did some early testing on this but haven?t had a chance to upgrade my protocol nodes to the new level. Upgrading 1000?s of node across many cluster is ? challenging :-) The newer commands are a great start. I like the ability to list out events related to a particular protocol. I could go on? Feel free to contact me directly for a more detailed discussion: robert.oesterlin @ nuance.com Bob Oesterlin Sr Storage Engineer, Nuance Communications From: > on behalf of Patrick Byrne Reply-To: gpfsug main discussion list Date: Thursday, October 1, 2015 at 5:09 AM To: "gpfsug-discuss at gpfsug.org" Subject: [gpfsug-discuss] Problem Determination Hi all, As I'm sure some of you aware, problem determination is an area that we are looking to try and make significant improvements to over the coming releases of Spectrum Scale. To help us target the areas we work to improve and make it as useful as possible I am trying to get as much feedback as I can about different problems users have, and how people go about solving them. I am interested in hearing everything from day to day annoyances to problems that have caused major frustration in trying to track down the root cause. Where possible it would be great to hear how the problems were dealt with as well, so that others can benefit from your experience. Feel free to reply to the mailing list - maybe others have seen similar problems and could provide tips for the future - or to me directly if you'd prefer (patbyrne at uk.ibm.com). On a related note, in 4.1.1 there was a component added that monitors the state of the various protocols that are now supported (NFS, SMB, Object). The output from this is available with the 'mmces state' and 'mmces events' CLIs and I would like to get feedback from anyone who has had the chance make use of this. Is it useful? How could it be improved? We are looking at the possibility of extending this component to cover more than just protocols, so any feedback would be greatly appreciated. Thanks in advance, Patrick Byrne IBM Spectrum Scale - Development Engineer IBM Systems - Manchester Lab IBM UK Limited ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Oct 2 17:58:41 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 2 Oct 2015 16:58:41 +0000 Subject: [gpfsug-discuss] Problem Determination In-Reply-To: References: <201510011010.t91AASm6029240@d06av08.portsmouth.uk.ibm.com>, Message-ID: I agree on docs, particularly on mmdiag, I think things like --lroc are not documented. I'm also not sure that --network always gives accurate network stats. (we were doing some ha failure testing where we have split site in and fabrics, yet the network counters didn't change even when the local ib nsd servers were shut down). It would be nice also to have a set of Icinga/Nagios plugins from IBM, maybe in samples whcich are updated on each release with new feature checks. And not problem determination, but id really like to see an inflight non disruptive upgrade path. Particularly as we run vms off gpfs, its bot always practical or possible to move vms, so would be nice to have upgrade in flight (not suggesting this would be a quick thing to implement). Simon ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Oesterlin, Robert [Robert.Oesterlin at nuance.com] Sent: 01 October 2015 13:39 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Problem Determination Hi Patrick I was going to mail you directly ? but this may help spark some discussion in this area. GPFS (pardon the use of the ?old school" term ? You need something easier to type that Spectrum Scale) problem determination is one of those areas that is (sometimes) more of an art than a science. IBM publishes a PD guide, and it?s a good start but doesn?t cover all the bases. - In the GPFS log (/var/mmfs/gen/mmfslog) there are a lot of messages generated. I continue to come across ones that are not documented ? or documented poorly. EVERYTHING that ends up in ANY log needs to be documented. - The PD guide gives some basic things to look at for many of the error messages, but doesn?t go into alternative explanation for many errors. Example: When a node gets expelled, the PD guide tells you it?s a communication issue, when it fact in may be related to other things like Linux network tuning. Covering all the possible causes is hard, but you can improve this. - GPFS waiter information ? understanding and analyzing this is key to getting to the bottom of many problems. The waiter information is not well documented. You should include at least a basic guide on how to use waiter information in determining cluster problems. Related: Undocumented config options. You can come across some by doing ?mmdiag ?config?. Using some of these can help you ? or get you in trouble in the long run. If I can see the option, document it. - Make sure that all information I might come across online is accurate, especially on those sites managed by IBM. The Developerworks wiki has great information, but there is a lot of information out there that?s out of date or inaccurate. This leads to confusion. - The automatic deadlock detection implemented in 4.1 can be useful, but it also can be problematic in a large cluster when you get into problems. Firing off traces and taking dumps in an automated manner can cause more problems if you have a large cluster. I ended up turning it off. - GPFS doesn?t have anything setup to alert you when conditions occur that may require your attention. There are some alerting capabilities that you can customize, but something out of the box might be useful. I know there is work going on in this area. mmces ? I did some early testing on this but haven?t had a chance to upgrade my protocol nodes to the new level. Upgrading 1000?s of node across many cluster is ? challenging :-) The newer commands are a great start. I like the ability to list out events related to a particular protocol. I could go on? Feel free to contact me directly for a more detailed discussion: robert.oesterlin @ nuance.com Bob Oesterlin Sr Storage Engineer, Nuance Communications From: > on behalf of Patrick Byrne Reply-To: gpfsug main discussion list Date: Thursday, October 1, 2015 at 5:09 AM To: "gpfsug-discuss at gpfsug.org" Subject: [gpfsug-discuss] Problem Determination Hi all, As I'm sure some of you aware, problem determination is an area that we are looking to try and make significant improvements to over the coming releases of Spectrum Scale. To help us target the areas we work to improve and make it as useful as possible I am trying to get as much feedback as I can about different problems users have, and how people go about solving them. I am interested in hearing everything from day to day annoyances to problems that have caused major frustration in trying to track down the root cause. Where possible it would be great to hear how the problems were dealt with as well, so that others can benefit from your experience. Feel free to reply to the mailing list - maybe others have seen similar problems and could provide tips for the future - or to me directly if you'd prefer (patbyrne at uk.ibm.com). On a related note, in 4.1.1 there was a component added that monitors the state of the various protocols that are now supported (NFS, SMB, Object). The output from this is available with the 'mmces state' and 'mmces events' CLIs and I would like to get feedback from anyone who has had the chance make use of this. Is it useful? How could it be improved? We are looking at the possibility of extending this component to cover more than just protocols, so any feedback would be greatly appreciated. Thanks in advance, Patrick Byrne IBM Spectrum Scale - Development Engineer IBM Systems - Manchester Lab IBM UK Limited From ewahl at osc.edu Fri Oct 2 19:00:46 2015 From: ewahl at osc.edu (Wahl, Edward) Date: Fri, 2 Oct 2015 18:00:46 +0000 Subject: [gpfsug-discuss] Problem Determination In-Reply-To: <201510011010.t91AASm6029240@d06av08.portsmouth.uk.ibm.com> References: <201510011010.t91AASm6029240@d06av08.portsmouth.uk.ibm.com> Message-ID: <9DA9EC7A281AC7428A9618AFDC49049955AEB4DF@CIO-KRC-D1MBX02.osuad.osu.edu> I'm not yet in the 4.x release stream so this may be taken with a grain (or more) of salt as we say. PLEASE keep the ability of commands to set -x or dump debug when the env DEBUG=1 is set. This has been extremely useful over the years. Granted I've never worked out why sometimes we see odd little things like machines deciding they suddenly need an FPO license or one nsd server suddenly decides it's name is part of the FQDN instead of just it's hostname and only for certain commands, but it's DAMN useful. Minor issues especially can be tracked down with it. Undocumented features and logged items abound. I'd say start there. This is one area where it is definitely more art than science with Spectrum Scale (meh GPFS still sounds better. So does Shark. Can we go back to calling it the Shark Server Project?) Complete failure of the verbs layer and fallback to other defined networks would be nice to know about during operation. It's excellent about telling you at startup but not so much during operation, at least in 3.5. I imagine with the 'automated compatibility layer building' I'll be looking for some serious amounts of PD for the issues we _will_ see there. We frequently build against kernels we are not yet running at this site, so this needs well documented PD and resolution. Ed Wahl OSC ________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Patrick Byrne [PATBYRNE at uk.ibm.com] Sent: Thursday, October 01, 2015 6:09 AM To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Problem Determination Hi all, As I'm sure some of you aware, problem determination is an area that we are looking to try and make significant improvements to over the coming releases of Spectrum Scale. To help us target the areas we work to improve and make it as useful as possible I am trying to get as much feedback as I can about different problems users have, and how people go about solving them. I am interested in hearing everything from day to day annoyances to problems that have caused major frustration in trying to track down the root cause. Where possible it would be great to hear how the problems were dealt with as well, so that others can benefit from your experience. Feel free to reply to the mailing list - maybe others have seen similar problems and could provide tips for the future - or to me directly if you'd prefer (patbyrne at uk.ibm.com). On a related note, in 4.1.1 there was a component added that monitors the state of the various protocols that are now supported (NFS, SMB, Object). The output from this is available with the 'mmces state' and 'mmces events' CLIs and I would like to get feedback from anyone who has had the chance make use of this. Is it useful? How could it be improved? We are looking at the possibility of extending this component to cover more than just protocols, so any feedback would be greatly appreciated. Thanks in advance, Patrick Byrne IBM Spectrum Scale - Development Engineer IBM Systems - Manchester Lab IBM UK Limited -------------- next part -------------- An HTML attachment was scrubbed... URL: From zgiles at gmail.com Fri Oct 2 21:27:17 2015 From: zgiles at gmail.com (Zachary Giles) Date: Fri, 2 Oct 2015 16:27:17 -0400 Subject: [gpfsug-discuss] Problem Determination In-Reply-To: <9DA9EC7A281AC7428A9618AFDC49049955AEB4DF@CIO-KRC-D1MBX02.osuad.osu.edu> References: <201510011010.t91AASm6029240@d06av08.portsmouth.uk.ibm.com> <9DA9EC7A281AC7428A9618AFDC49049955AEB4DF@CIO-KRC-D1MBX02.osuad.osu.edu> Message-ID: I would like to see better performance metrics / counters from GPFS. I know we already have mmpmon, which is generally really good -- I've done some fun things with it and it has been a great tool. And, I realize that there is supposedly a new monitoring framework in 4.x.. which I haven't played with yet. But, Generally it would be extremely helpful to get synchronized (across all nodes) high accuracy counters of data flow, number of waiters, page pool stats, distribution of data from one layer to another down to NSDs.. etc etc etc. I believe many of these counters already exist, but they're hidden in some mmfsadm xx command that one needs to troll through with possible performance implications. mmpmon can do some of this, but it's only a handful of counters, it's hard to say how synchronized the counters are across nodes, and I've personally seen an mmpmon run go bad and take down a cluster. It would be nice if it were pushed out, or provided in a safe manner with the design and expectation of "log-everything forever continuously". As GSS/ESS systems start popping up, I realize they have this other monitoring framework to watch the VD throughputs.. which is great. But, that doesn't allow us to monitor more traditional types. Would be nice to monitor it all together the same way so we don't miss-out on monitoring half the infrastructure or buying a cluster with some fancy GUI that can't do what we want.. -Zach On Fri, Oct 2, 2015 at 2:00 PM, Wahl, Edward wrote: > I'm not yet in the 4.x release stream so this may be taken with a grain (or > more) of salt as we say. > > PLEASE keep the ability of commands to set -x or dump debug when the env > DEBUG=1 is set. This has been extremely useful over the years. Granted > I've never worked out why sometimes we see odd little things like machines > deciding they suddenly need an FPO license or one nsd server suddenly > decides it's name is part of the FQDN instead of just it's hostname and only > for certain commands, but it's DAMN useful. Minor issues especially can be > tracked down with it. > > Undocumented features and logged items abound. I'd say start there. This > is one area where it is definitely more art than science with Spectrum Scale > (meh GPFS still sounds better. So does Shark. Can we go back to calling it > the Shark Server Project?) > > Complete failure of the verbs layer and fallback to other defined networks > would be nice to know about during operation. It's excellent about telling > you at startup but not so much during operation, at least in 3.5. > > I imagine with the 'automated compatibility layer building' I'll be looking > for some serious amounts of PD for the issues we _will_ see there. We > frequently build against kernels we are not yet running at this site, so > this needs well documented PD and resolution. > > Ed Wahl > OSC > > > ________________________________ > From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] > on behalf of Patrick Byrne [PATBYRNE at uk.ibm.com] > Sent: Thursday, October 01, 2015 6:09 AM > To: gpfsug-discuss at gpfsug.org > Subject: [gpfsug-discuss] Problem Determination > > Hi all, > > As I'm sure some of you aware, problem determination is an area that we are > looking to try and make significant improvements to over the coming releases > of Spectrum Scale. To help us target the areas we work to improve and make > it as useful as possible I am trying to get as much feedback as I can about > different problems users have, and how people go about solving them. > > I am interested in hearing everything from day to day annoyances to problems > that have caused major frustration in trying to track down the root cause. > Where possible it would be great to hear how the problems were dealt with as > well, so that others can benefit from your experience. Feel free to reply to > the mailing list - maybe others have seen similar problems and could provide > tips for the future - or to me directly if you'd prefer > (patbyrne at uk.ibm.com). > > On a related note, in 4.1.1 there was a component added that monitors the > state of the various protocols that are now supported (NFS, SMB, Object). > The output from this is available with the 'mmces state' and 'mmces events' > CLIs and I would like to get feedback from anyone who has had the chance > make use of this. Is it useful? How could it be improved? We are looking at > the possibility of extending this component to cover more than just > protocols, so any feedback would be greatly appreciated. > > Thanks in advance, > > Patrick Byrne > IBM Spectrum Scale - Development Engineer > IBM Systems - Manchester Lab > IBM UK Limited > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Zach Giles zgiles at gmail.com From Luke.Raimbach at crick.ac.uk Mon Oct 5 13:57:14 2015 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Mon, 5 Oct 2015 12:57:14 +0000 Subject: [gpfsug-discuss] Independent Inode Space Limit Message-ID: Hi All, When creating an independent inode space, I see the valid range for the number of inodes is between 1024 and 4294967294. Is the ~4.2billion upper limit something that can be increased in the future? I also see that the first 1024 inodes are immediately allocated upon creation. I assume these are allocated to internal data structures and are a copy of a subset of the first 4038 inodes allocated for new file systems? It would be useful to know if these internal structures are fixed for independent filesets and if they are not, what factors determine their layout (for performance purposes). Many Thanks, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. From usa-principal at gpfsug.org Mon Oct 5 14:55:15 2015 From: usa-principal at gpfsug.org (usa-principal-gpfsug.org) Date: Mon, 05 Oct 2015 09:55:15 -0400 Subject: [gpfsug-discuss] Final Reminder: Inaugural US "Meet the Developers" Message-ID: <9656d0110c2be4b339ec5ce662409b8e@webmail.gpfsug.org> A last reminder to check in with Janet if you have not done so already. Looking forward to this event on Wednesday this week. Best, Kristy --- Hello Everyone, Here is a reminder about our inaugural US "Meet the Developers" session. Details are below, and please send an e-mail to Janet Ellsworth (janetell at us.ibm.com) by next Friday September 18th if you wish to attend. Janet is on the product management team for Spectrum Scale and is helping with the logistics for this first event. Date: Wednesday, October 7th Place: IBM building at 590 Madison Avenue, New York City Time: 12:30 to 5 PM (Lunch will be served at 12:30, and sessions will start between 1 and 1:30 PM. Afternoon snacks will be served as well :-) Agenda IBM development architect to present the new protocols support that was released with Spectrum Scale 4.1.1 in June. IBM developer to demo future Graphical User Interface ***Member of user community to present an experience with using Spectrum Scale (still seeking volunteers for this !)*** Open Q&A with the development team We are happy to have heard from many of you so far who would like to attend. We still have room however, so please get in touch by the 9/18 date if you would like to attend. ***We also need someone to share an experience or use case scenario with Spectrum Scale for this event, so please let Janet know if you are willing to do that too.*** As you have likely seen, we are also working on the agenda and timing for day-long GPFS US UG event in Austin during November aligned with SC15 and there will be more details on that coming soon. From secretary at gpfsug.org Wed Oct 7 12:50:51 2015 From: secretary at gpfsug.org (Secretary GPFS UG) Date: Wed, 07 Oct 2015 12:50:51 +0100 Subject: [gpfsug-discuss] Places available: Meet the Devs Message-ID: <813d82bd5074b90c3a67acc85a03995b@webmail.gpfsug.org> Hi All, There are still places available for the next 'Meet the Devs' event in Edinburgh on Friday 23rd October from 10:30/11am until 3/3:30pm. It's a great opportunity for you to meet with developers and talk through specific issues as well as learn more from the experts. Location: Room 2009a, Information Services, James Clerk Maxwell Building, Peter Guthrie Tait Road, Edinburgh EH9 3FD Google maps link: https://goo.gl/maps/Ta7DQ Agenda: - GUI - 4.2 Updates/show and tell - Open conversation on any areas of interest attendees may have Lunch and refreshments will be provided. Please email me (secretary at gpfsug.org) if you would like to attend including any particular topics of interest you would like to discuss. Best wishes, -- Claire O'Toole GPFS User Group Secretary +44 (0)7508 033896 www.gpfsug.org From service at metamodul.com Wed Oct 7 16:06:56 2015 From: service at metamodul.com (service at metamodul.com) Date: Wed, 07 Oct 2015 17:06:56 +0200 Subject: [gpfsug-discuss] Places available: Meet the Devs Message-ID: Hi Claire, I will attend the meeting. Hans-Joachim Ehlers MetaModul GmbH Germany Cheers Hajo Von Samsung Mobile gesendet
-------- Urspr?ngliche Nachricht --------
Von: Secretary GPFS UG
Datum:2015.10.07 13:50 (GMT+01:00)
An: gpfsug main discussion list
Betreff: [gpfsug-discuss] Places available: Meet the Devs
Hi All, There are still places available for the next 'Meet the Devs' event in Edinburgh on Friday 23rd October from 10:30/11am until 3/3:30pm. It's a great opportunity for you to meet with developers and talk through specific issues as well as learn more from the experts. Location: Room 2009a, Information Services, James Clerk Maxwell Building, Peter Guthrie Tait Road, Edinburgh EH9 3FD Google maps link: https://goo.gl/maps/Ta7DQ Agenda: - GUI - 4.2 Updates/show and tell - Open conversation on any areas of interest attendees may have Lunch and refreshments will be provided. Please email me (secretary at gpfsug.org) if you would like to attend including any particular topics of interest you would like to discuss. Best wishes, -- Claire O'Toole GPFS User Group Secretary +44 (0)7508 033896 www.gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Douglas.Hughes at DEShawResearch.com Wed Oct 7 19:59:26 2015 From: Douglas.Hughes at DEShawResearch.com (Hughes, Doug) Date: Wed, 7 Oct 2015 18:59:26 +0000 Subject: [gpfsug-discuss] new member, first post Message-ID: sitting here in the US GPFS UG meeting in NYC and just found out about this list. We've been a GPFS user for many years, first with integrated DDN support, but now also with a GSS system. we have about 4PB of raw GPFS storage and 1 billion inodes. We keep our metadata on TMS ramsan for very fast policy execution for tiering and migration. We use GPFS to hold the primary source data from our custom supercomputers. We have many policies executed periodically for managing the data, including writing certain files to dedicated fast pools and then migrating the data off to wide swaths of disk for read access from cluster clients. One pain point, which I'm sure many of the rest of you have seen, restripe operations for just metadata are unnecessarily slow. If we experience a flash module failure and need to restripe, it also has to check all of the data. I have a feature request open to make metadata restripes only look at metadata (since it is on RamSan/FlashCache, this should be very fast) instead of scanning everything, which can and does take months with performance impacts. Doug Hughes D. E. Shaw Research, LLC. Sent from my android device. -------------- next part -------------- An HTML attachment was scrubbed... URL: From chair at gpfsug.org Thu Oct 8 20:37:05 2015 From: chair at gpfsug.org (GPFS UG Chair (Simon Thompson)) Date: Thu, 08 Oct 2015 20:37:05 +0100 Subject: [gpfsug-discuss] User group update Message-ID: Hi, I thought I'd drop an update to the group on various admin things which have been going on behind the scenes. The first US meet the devs event was held yesterday, and I'm hoping someone who went will be preparing a blog post to cover the event a little. I know a bunch of people have joined the mailing list since then, so welcome to the group to all of those! ** User Group Engagement with IBM ** I also met with Akhtar yesterday who is the IBM VP for Technical Computing Developments (which includes Spectrum Scale). He was in the UK for a few days at the IBM Manchester Labs, so we managed to squeeze a meeting to talk a bit about the UG. I'm very pleased that Akhtar confirmed IBMs commitment to help the user group in both the UK and USA with developer support for the meet the devs and annual group meetings. I'd like to extend my thanks to those at IBM who are actively supporting the group in so many ways. One idea we have been mulling over is filming the talks at next year's events and then putting those on Youtube for people who can't get there. IBM have given us tentative agreement to do this, subject to a few conditions. Most importantly that the UG and IBM ensure we don't publish customer or IBM items which are NDA/not for general public consumption. I'm hopeful we can get this all approved and if we do, we'll be looking to the community to help us out (anyone got digital camera equipment we might be able to borrow, or some help with editing down afterwards?) Whilst in Manchester I also met with Patrick to talk over the various emails people have sent in about problem determination, which Patrick will be taking to the dev meeting in a few weeks. It sounds like there are some interesting ideas kicking about, so hopefully we'll get some value from the user group input. Some of the new features in 4.2 were also demo'd and for those who might not have been to a meet the devs session and are interested in the upcoming GUI, it is now in public beta, head over to developer works for more details: https://www.ibm.com/developerworks/community/forums/html/topic?id=4dc34bf1- 17d1-4dc0-af72-6dc5a3f93e82&ps=25 ** User Group Feedback ** Over the past few months, I've also been collecting feedback from people, either comments on the mailing list, or those who I've spoken to, which was all collated and sent in to IBM, we'll hopefully be getting some feedback on that in the next few weeks - there's a bunch of preliminary answers now, but a few places we still need a bit of clarification. There's also some longer term discussion going on about GPFS and cloud (in particular to those of us in scientific areas). We'll feed that back as and when we get responses we can share. We'd like to ensure that we gather as much feedback from users so that we can collectively take it to IBM, so please do continue to post comments etc to the mailing list. ** Diary Dates ** A few dates for diaries: * Meet the Devs in Edinburgh - Friday 23rd October 2015 * GPFS UG Meeting @ SC15 in Austin, USA - Sunday 15th November 2015 * GPFS UG Meeting @ Computing Insight UK, Coventry, UK - Tuesday 8th December 2015 (Note you must be registered also for CIUK) * GPFS UG Meeting May 2015 - IBM South Bank, London, UK- 17th/18th May 2016 ** User Group Admin ** Within the committee, we've been talking about how we can extend the reach of the group, so we may be reaching out to a few group members to take this forward. Of course if anyone has suggestions on how we can ensure we reach as many people as possible, please let me know, either via the mailing list of directly by email. I know there are lot of people on the mailing list who don't post (regularly), so I'd be interested to hear if you find the group mailing list discussion useful, if you feel there are barriers to asking questions, or what you'd like to see coming out of the user group - please feel free to email me directly if you'd like to comment on any of this! We've also registered spectrumscale.org to point to the user group, so you may start to see the group marketed as the Spectrum Scale User Group, but rest assured, its still the same old GPFS User Group ;-) Just a reminder that we made the mailing list so that only members can post. This was to reduce the amount of spam coming in and being held for moderation (and a few legit posts got lost this way). If you do want to post, but not receive the emails, you can set this as an option in the mailing list software. Finally, I've also fixed the mailing list archives, so these are now available at: http://www.gpfsug.org/pipermail/gpfsug-discuss/ Simon GPFS UG, UK Chair From L.A.Hurst at bham.ac.uk Fri Oct 9 09:25:52 2015 From: L.A.Hurst at bham.ac.uk (Laurence Alexander Hurst (IT Services)) Date: Fri, 9 Oct 2015 08:25:52 +0000 Subject: [gpfsug-discuss] User group update Message-ID: On 08/10/2015 20:37, "gpfsug-discuss-bounces at gpfsug.org on behalf of GPFS UG Chair (Simon Thompson)" wrote: >GPFS UG Meeting May 2015 - IBM South Bank, London, UK- 17th/18th May >2016 Daft question: is that 17th *and* 18th or 17th *or* 18th (presumably TBC)? Thanks, Laurence -- Laurence Hurst Research Support, IT Services, University of Birmingham From S.J.Thompson at bham.ac.uk Fri Oct 9 10:00:11 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 9 Oct 2015 09:00:11 +0000 Subject: [gpfsug-discuss] User group update In-Reply-To: References: Message-ID: Both days. May 2016 is a two day event. Simon ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Laurence Alexander Hurst (IT Services) [L.A.Hurst at bham.ac.uk] Sent: 09 October 2015 09:25 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] User group update On 08/10/2015 20:37, "gpfsug-discuss-bounces at gpfsug.org on behalf of GPFS UG Chair (Simon Thompson)" wrote: >GPFS UG Meeting May 2015 - IBM South Bank, London, UK- 17th/18th May >2016 Daft question: is that 17th *and* 18th or 17th *or* 18th (presumably TBC)? Thanks, Laurence -- Laurence Hurst Research Support, IT Services, University of Birmingham _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From S.J.Thompson at bham.ac.uk Sat Oct 10 14:54:22 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Sat, 10 Oct 2015 13:54:22 +0000 Subject: [gpfsug-discuss] User group update Message-ID: > >We've also registered spectrumscale.org to point to the user group, so you >may start to see the group marketed as the Spectrum Scale User Group, but >rest assured, its still the same old GPFS User Group ;-) And this is just a test mail to ensure that mail to gpfsug-discuss at spectrumscale.org gets through OK. The old address should also still work. Simon From S.J.Thompson at bham.ac.uk Sat Oct 10 14:55:55 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Sat, 10 Oct 2015 13:55:55 +0000 Subject: [gpfsug-discuss] User group update In-Reply-To: References: Message-ID: On 10/10/2015 14:54, "Simon Thompson (Research Computing - IT Services)" wrote: >> >>We've also registered spectrumscale.org to point to the user group, so >>you >>may start to see the group marketed as the Spectrum Scale User Group, but >>rest assured, its still the same old GPFS User Group ;-) > >And this is just a test mail to ensure that mail to >gpfsug-discuss at spectrumscale.org gets through OK. The old address should >also still work. And checking the old address still works fine as well. Simon From Robert.Oesterlin at nuance.com Tue Oct 13 03:03:45 2015 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 13 Oct 2015 02:03:45 +0000 Subject: [gpfsug-discuss] User group Meeting at SC15 - Registration Message-ID: We?d like to have all those attending the user group meeting at SC15 to register ? details are below. Thanks to IBM for getting the space and arranging all the details. I?ll post a more detailed agenda soon. Looking forward to meeting everyone! Location: JW Marriott 110 E 2nd Street Austin, Texas United States Date and Time: Sunday Nov 15, 1:00 PM?5:30 PM Agenda: - Latest IBM Spectrum Scale enhancements - Future directions and roadmap* (NDA required) - Newer usecases and User presentations Registration: Please register at the below link to book your seat. https://www-950.ibm.com/events/wwe/grp/grp017.nsf/v17_agenda?openform&seminar=99QNTNES&locale=en_US&S_TACT=sales Bob Oesterlin Sr Storage Engineer, Nuance Communications 507-269-0413 -------------- next part -------------- An HTML attachment was scrubbed... URL: From chair at spectrumscale.org Sat Oct 17 20:51:50 2015 From: chair at spectrumscale.org (GPFS UG Chair (Simon Thompson)) Date: Sat, 17 Oct 2015 20:51:50 +0100 Subject: [gpfsug-discuss] Blog on USA Meet the Devs Message-ID: Hi All, Kirsty wrote a blog post on the inaugural meet the devs in the USA. You can find it here: http://www.spectrumscale.org/inaugural-usa-meet-the-devs/ Thanks to Kristy, Bob and Pallavi for organising, the IBM devs and the group members giving talks. Simon From Tomasz.Wolski at ts.fujitsu.com Wed Oct 21 15:23:54 2015 From: Tomasz.Wolski at ts.fujitsu.com (Wolski, Tomasz) Date: Wed, 21 Oct 2015 16:23:54 +0200 Subject: [gpfsug-discuss] Intro Message-ID: Hi All, My name is Tomasz Wolski and I?m development engineer at Fujitsu Technology Solutions in Lodz, Poland. We?ve been using GPFS in our main product, which is ETERNUS CS8000, for many years now. GPFS helps us to build a consolidation of backup and archiving solutions for our end customers. We make use of GPFS snapshots, NIFS/CIFS services, GPFS API for our internal components and many many more .. :) My main responsibility, except developing new features for our system, is integration new GPFS versions into our system and bug tracking GPFS issues. Best regards, Tomasz Wolski -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Oct 23 15:04:49 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 23 Oct 2015 14:04:49 +0000 Subject: [gpfsug-discuss] Independent Inode Space Limit Message-ID: >When creating an independent inode space, I see the valid range for the >number of inodes is between 1024 and 4294967294. > >Is the ~4.2billion upper limit something that can be increased in the >future? > >I also see that the first 1024 inodes are immediately allocated upon >creation. I assume these are allocated to internal data structures and >are a copy of a subset of the first 4038 inodes allocated for new file >systems? It would be useful to know if these internal structures are >fixed for independent filesets and if they are not, what factors >determine their layout (for performance purposes). Anyone have any thoughts on this? Anyone from IBM know? Thanks Simon From sfadden at us.ibm.com Fri Oct 23 13:42:14 2015 From: sfadden at us.ibm.com (Scott Fadden) Date: Fri, 23 Oct 2015 07:42:14 -0500 Subject: [gpfsug-discuss] Independent Inode Space Limit In-Reply-To: References: Message-ID: <201510231442.t9NEgt6b026687@d03av05.boulder.ibm.com> GPFS limits the max inodes based on metadata space. Add more metadata space and you should be able to add more inodes. Scott Fadden Spectrum Scale - Technical Marketing Phone: (503) 880-5833 sfadden at us.ibm.com http://www.ibm.com/systems/storage/spectrum/scale From: "Simon Thompson (Research Computing - IT Services)" To: gpfsug main discussion list Date: 10/23/2015 09:05 AM Subject: Re: [gpfsug-discuss] Independent Inode Space Limit Sent by: gpfsug-discuss-bounces at spectrumscale.org >When creating an independent inode space, I see the valid range for the >number of inodes is between 1024 and 4294967294. > >Is the ~4.2billion upper limit something that can be increased in the >future? > >I also see that the first 1024 inodes are immediately allocated upon >creation. I assume these are allocated to internal data structures and >are a copy of a subset of the first 4038 inodes allocated for new file >systems? It would be useful to know if these internal structures are >fixed for independent filesets and if they are not, what factors >determine their layout (for performance purposes). Anyone have any thoughts on this? Anyone from IBM know? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From sfadden at us.ibm.com Fri Oct 23 13:42:14 2015 From: sfadden at us.ibm.com (Scott Fadden) Date: Fri, 23 Oct 2015 07:42:14 -0500 Subject: [gpfsug-discuss] Independent Inode Space Limit In-Reply-To: References: Message-ID: <201510231442.t9NEgQ0M024262@d01av05.pok.ibm.com> GPFS limits the max inodes based on metadata space. Add more metadata space and you should be able to add more inodes. Scott Fadden Spectrum Scale - Technical Marketing Phone: (503) 880-5833 sfadden at us.ibm.com http://www.ibm.com/systems/storage/spectrum/scale From: "Simon Thompson (Research Computing - IT Services)" To: gpfsug main discussion list Date: 10/23/2015 09:05 AM Subject: Re: [gpfsug-discuss] Independent Inode Space Limit Sent by: gpfsug-discuss-bounces at spectrumscale.org >When creating an independent inode space, I see the valid range for the >number of inodes is between 1024 and 4294967294. > >Is the ~4.2billion upper limit something that can be increased in the >future? > >I also see that the first 1024 inodes are immediately allocated upon >creation. I assume these are allocated to internal data structures and >are a copy of a subset of the first 4038 inodes allocated for new file >systems? It would be useful to know if these internal structures are >fixed for independent filesets and if they are not, what factors >determine their layout (for performance purposes). Anyone have any thoughts on this? Anyone from IBM know? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From wsawdon at us.ibm.com Fri Oct 23 16:25:33 2015 From: wsawdon at us.ibm.com (Wayne Sawdon) Date: Fri, 23 Oct 2015 08:25:33 -0700 Subject: [gpfsug-discuss] Independent Inode Space Limit In-Reply-To: <201510231442.t9NEgt6b026687@d03av05.boulder.ibm.com> References: <201510231442.t9NEgt6b026687@d03av05.boulder.ibm.com> Message-ID: <201510231525.t9NFPr1G010768@d03av04.boulder.ibm.com> >I also see that the first 1024 inodes are immediately allocated upon >creation. I assume these are allocated to internal data structures and >are a copy of a subset of the first 4038 inodes allocated for new file >systems? It would be useful to know if these internal structures are >fixed for independent filesets and if they are not, what factors >determine their layout (for performance purposes). Independent filesets don't have the internal structures that the file system has. Other than the fileset's root directory all of the remaining inodes can be allocated to user files. Inodes are always allocated in full metadata blocks. The inodes for an independent fileset are allocated in their own blocks. This makes fileset snapshots more efficient, since a copy-on-write of the block of inodes will only copy inodes in the fileset. The inode blocks for all filesets are in the same inode file, but the blocks for each independent fileset are strided, making them easy to prefetch for policy scans. -Wayne -------------- next part -------------- An HTML attachment was scrubbed... URL: From wsawdon at us.ibm.com Fri Oct 23 16:25:33 2015 From: wsawdon at us.ibm.com (Wayne Sawdon) Date: Fri, 23 Oct 2015 08:25:33 -0700 Subject: [gpfsug-discuss] Independent Inode Space Limit In-Reply-To: <201510231442.t9NEgt6b026687@d03av05.boulder.ibm.com> References: <201510231442.t9NEgt6b026687@d03av05.boulder.ibm.com> Message-ID: <201510231525.t9NFPv9P004320@d01av03.pok.ibm.com> >I also see that the first 1024 inodes are immediately allocated upon >creation. I assume these are allocated to internal data structures and >are a copy of a subset of the first 4038 inodes allocated for new file >systems? It would be useful to know if these internal structures are >fixed for independent filesets and if they are not, what factors >determine their layout (for performance purposes). Independent filesets don't have the internal structures that the file system has. Other than the fileset's root directory all of the remaining inodes can be allocated to user files. Inodes are always allocated in full metadata blocks. The inodes for an independent fileset are allocated in their own blocks. This makes fileset snapshots more efficient, since a copy-on-write of the block of inodes will only copy inodes in the fileset. The inode blocks for all filesets are in the same inode file, but the blocks for each independent fileset are strided, making them easy to prefetch for policy scans. -Wayne -------------- next part -------------- An HTML attachment was scrubbed... URL: From kallbac at iu.edu Mon Oct 26 02:38:52 2015 From: kallbac at iu.edu (Kallback-Rose, Kristy A) Date: Sun, 25 Oct 2015 22:38:52 -0400 Subject: [gpfsug-discuss] ILM and Backup Question Message-ID: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> Simon wrote recently in the GPFS UG Blog: "We also got into discussion on backup and ILM, and I think its amazing how everyone does these things in their own slightly different way. I think this might be an interesting area for discussion over on the group mailing list. There's a lot of options and different ways to do things!? Yes, please! I?m *very* interested in what others are doing. We (IU) are currently doing a POC with GHI for DR backups (GHI=GPFS HPSS Integration?we have had HPSS for a very long time), but I?m interested what others are doing with either ILM or other methods to brew their own backup solutions, how much they are backing up and with what regularity, what resources it takes, etc. If you have anything going on at your site that?s relevant, can you please share? Thanks, Kristy Kristy Kallback-Rose Manager, Research Storage Indiana University -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 495 bytes Desc: Message signed with OpenPGP using GPGMail URL: From st.graf at fz-juelich.de Mon Oct 26 08:43:33 2015 From: st.graf at fz-juelich.de (Stephan Graf) Date: Mon, 26 Oct 2015 09:43:33 +0100 Subject: [gpfsug-discuss] ILM and Backup Question In-Reply-To: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> References: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> Message-ID: <562DE7B5.7080303@fz-juelich.de> Hi! We at J?lich Supercomputing Centre have two ILM managed file systems (GPFS and HSM from TSM). #50 mio files + 10 PB data on tape #30 mio files + 8 PB data on tape For backup we use mmbackup (dsmc) for the user HOME directory (no ILM) #120 mio files => 3 hours get candidate list + x hour backup We use also mmbackup for the ILM managed filesystem. Policy: the file must be backed up first before migrated to tape 2-3 hour for candidate list + x hours/days/weeks backups (!!!) -> a metadata change (e.g. renaming a directory by the user) enforces a new backup of the files which causes a very expensive tape inline copy! Greetings from J?lich, Germany Stephan On 10/26/15 03:38, Kallback-Rose, Kristy A wrote: Simon wrote recently in the GPFS UG Blog: "We also got into discussion on backup and ILM, and I think its amazing how everyone does these things in their own slightly different way. I think this might be an interesting area for discussion over on the group mailing list. There's a lot of options and different ways to do things!? Yes, please! I?m *very* interested in what others are doing. We (IU) are currently doing a POC with GHI for DR backups (GHI=GPFS HPSS Integration?we have had HPSS for a very long time), but I?m interested what others are doing with either ILM or other methods to brew their own backup solutions, how much they are backing up and with what regularity, what resources it takes, etc. If you have anything going on at your site that?s relevant, can you please share? Thanks, Kristy Kristy Kallback-Rose Manager, Research Storage Indiana University _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ -------------- next part -------------- An HTML attachment was scrubbed... URL: From Douglas.Hughes at DEShawResearch.com Mon Oct 26 13:42:47 2015 From: Douglas.Hughes at DEShawResearch.com (Hughes, Doug) Date: Mon, 26 Oct 2015 13:42:47 +0000 Subject: [gpfsug-discuss] ILM and Backup Question In-Reply-To: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> References: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> Message-ID: We have all of our GPFSmetadata on FlashCache devices (nee Ramsan) and that helps a lot. We also have our data going into monotonically increasing buckets of about 30TB that we call lockers (e.g. locker100, locker101, locker102), with 1 primary active at a time. We have an hourly job that scans the most recent 2 lockers (taked about 45 seconds each) to generate a file list using the ILM 'LIST' policy of all files that have been modified or created in the last hour. That goes to a file that has all of the names which then trickles to a custom backup daemon that has up to 10 threads for rsyncing these over to our HSM server (running GPFS/TSM space management). From there things automatically get backed up and archived. Not all hourlies are necessarily complete (we can't guarantee that nobody is still hanging on to $lockernum-2 for instance), so we have a daily that scans the entire 3PB to find anything created/updated in the last 24 hours and does an rsync on that. There's no harm in duplication of hourlies from the rsync perspective because rsync takes care of that (already exists on destination). The daily job takes about 45 minutes. Needless to say it would be impossible without metadata on a fast flash device. Sent from my android device. -----Original Message----- From: "Kallback-Rose, Kristy A" To: gpfsug main discussion list Sent: Sun, 25 Oct 2015 22:39 Subject: [gpfsug-discuss] ILM and Backup Question Simon wrote recently in the GPFS UG Blog: "We also got into discussion on backup and ILM, and I think its amazing how everyone does these things in their own slightly different way. I think this might be an interesting area for discussion over on the group mailing list. There's a lot of options and different ways to do things!? Yes, please! I?m *very* interested in what others are doing. We (IU) are currently doing a POC with GHI for DR backups (GHI=GPFS HPSS Integration?we have had HPSS for a very long time), but I?m interested what others are doing with either ILM or other methods to brew their own backup solutions, how much they are backing up and with what regularity, what resources it takes, etc. If you have anything going on at your site that?s relevant, can you please share? Thanks, Kristy Kristy Kallback-Rose Manager, Research Storage Indiana University -------------- next part -------------- An HTML attachment was scrubbed... URL: From Douglas.Hughes at DEShawResearch.com Mon Oct 26 13:42:47 2015 From: Douglas.Hughes at DEShawResearch.com (Hughes, Doug) Date: Mon, 26 Oct 2015 13:42:47 +0000 Subject: [gpfsug-discuss] ILM and Backup Question In-Reply-To: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> References: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> Message-ID: We have all of our GPFSmetadata on FlashCache devices (nee Ramsan) and that helps a lot. We also have our data going into monotonically increasing buckets of about 30TB that we call lockers (e.g. locker100, locker101, locker102), with 1 primary active at a time. We have an hourly job that scans the most recent 2 lockers (taked about 45 seconds each) to generate a file list using the ILM 'LIST' policy of all files that have been modified or created in the last hour. That goes to a file that has all of the names which then trickles to a custom backup daemon that has up to 10 threads for rsyncing these over to our HSM server (running GPFS/TSM space management). From there things automatically get backed up and archived. Not all hourlies are necessarily complete (we can't guarantee that nobody is still hanging on to $lockernum-2 for instance), so we have a daily that scans the entire 3PB to find anything created/updated in the last 24 hours and does an rsync on that. There's no harm in duplication of hourlies from the rsync perspective because rsync takes care of that (already exists on destination). The daily job takes about 45 minutes. Needless to say it would be impossible without metadata on a fast flash device. Sent from my android device. -----Original Message----- From: "Kallback-Rose, Kristy A" To: gpfsug main discussion list Sent: Sun, 25 Oct 2015 22:39 Subject: [gpfsug-discuss] ILM and Backup Question Simon wrote recently in the GPFS UG Blog: "We also got into discussion on backup and ILM, and I think its amazing how everyone does these things in their own slightly different way. I think this might be an interesting area for discussion over on the group mailing list. There's a lot of options and different ways to do things!? Yes, please! I?m *very* interested in what others are doing. We (IU) are currently doing a POC with GHI for DR backups (GHI=GPFS HPSS Integration?we have had HPSS for a very long time), but I?m interested what others are doing with either ILM or other methods to brew their own backup solutions, how much they are backing up and with what regularity, what resources it takes, etc. If you have anything going on at your site that?s relevant, can you please share? Thanks, Kristy Kristy Kallback-Rose Manager, Research Storage Indiana University -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Mon Oct 26 20:15:26 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Mon, 26 Oct 2015 20:15:26 +0000 Subject: [gpfsug-discuss] ILM and Backup Question In-Reply-To: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> References: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> Message-ID: Hi Kristy, Yes thanks for picking this up. So we (UoB) have 3 GPFS environments, each with different approaches. 1. OpenStack (GPFS as infrastructure) - we don't back this up at all. Partly this is because we are still in pilot phase, and partly because we also have ~7PB CEPH over 4 sites for this project, and the longer term aim is for us to ensure data sets and important VM images are copied into the CEPH store (and then replicated to at least 1 other site). We have some challenges with this, how should we do this? We're sorta thinging about maybe going down the irods route for this, policy scan the FS maybe, add xattr onto important data, and use that to get irods to send copies into CEPH (somehow). So this would be a bit of a hybrid home-grown solution going on here. Anyone got suggestions about how to approach this? I know IBM are now an irods consortium member, so any magic coming from IBM to integrate GFPS and irods? 2. HPC. We differentiate on our HPC file-system between backed up and non backed up space. Mostly its non backed up, where we encourage users to keep scratch data sets. We provide a small(ish) home directory which is backed up with TSM to tape, and also backup applications and system configs of the system. We use a bunch of jobs to sync some configs into local git which also is stored in the backed up part of the FS, so things like switch configs, icinga config can be backed up sanely. 3. Research Data Storage. This is a large bulk data storage solution. So far its not actually that large (few hundred TB), so we take the traditional TSM back to tape approach (its also sync replicated between data centres). We're already starting to see some possible slowness on this with data ingest and we've only just launched the service. Maybe that is a cause of launching that we suddenly see high data ingest. We are also experimenting with HSM to tape, but other than that we have no other ILM policies - only two tiers of disk, SAS for metadata and NL-SAS for bulk data. I'd like to see a flash tier in there for Metadata, which would free SAS drives and so we might be more into ILM policies. We have some more testing with snapshots to do, and have some questions about recovery of HSM files if the FS is snapshotted. Anyone any experience with this with 4.1 upwards versions of GPFS? Straight TSM backup for us means we can end up with 6 copies of data - once per data centre, backup + offsite backup tape set, HSM pool + offsite copy of HSM pool. (If an HSM tape fails, how do we know what to restore from backup? Hence we make copies of the HSM tapes as well). As our backups run on TSM, it uses the policy engine and mmbackup, so we only backup changes and new files, and never backup twice from the FS. Does anyone know how TSM backups handle XATTRs? This is one of the questions that was raised at meet the devs. Or even other attributes like immutability, as unless you are in complaint mode, its possible for immutable files to be deleted in some cases. In fact this is an interesting topic, it just occurred to me, what happens if your HSM tape fails and it contained immutable files. Would it be possible to recover these files if you don't have a copy of the HSM tape? - can you do a synthetic recreate of the TSM HSM tape from backups? We typically tell users that backups are for DR purposes, but that we'll make efforts to try and restore files subject to resource availability. Is anyone using SOBAR? What is your rationale for this? I can see that at scale, there are lot of benefits to this. But how do you handle users corrupting/deleting files etc? My understanding of SOBAR is that it doesn't give you the same ability to recover versions of files, deletions etc that straight TSM backup does. (this is something I've been meaning to raise for a while here). So what do others do? Do you have similar approaches to not backing up some types of data/areas? Do you use TSM or home-grown solutions? Or even other commercial backup solutions? What are your rationales for making decisions on backup approaches? Has anyone built their own DMAPI type interface for doing these sorts of things? Snapshots only? Do you allow users to restore themselves? If you are using ILM, are you doing it with straight policy, or is TSM playing part of the game? (If people want to comment anonymously on this without committing their company on list, happy to take email to the chair@ address and forward on anonymously to the group). Simon On 26/10/2015, 02:38, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Kallback-Rose, Kristy A" wrote: >Simon wrote recently in the GPFS UG Blog: "We also got into discussion on >backup and ILM, and I think its amazing how everyone does these things in >their own slightly different way. I think this might be an interesting >area for discussion over on the group mailing list. There's a lot of >options and different ways to do things!? > >Yes, please! I?m *very* interested in what others are doing. > >We (IU) are currently doing a POC with GHI for DR backups (GHI=GPFS HPSS >Integration?we have had HPSS for a very long time), but I?m interested >what others are doing with either ILM or other methods to brew their own >backup solutions, how much they are backing up and with what regularity, >what resources it takes, etc. > >If you have anything going on at your site that?s relevant, can you >please share? > >Thanks, >Kristy > >Kristy Kallback-Rose >Manager, Research Storage >Indiana University From wsawdon at us.ibm.com Mon Oct 26 21:12:55 2015 From: wsawdon at us.ibm.com (Wayne Sawdon) Date: Mon, 26 Oct 2015 13:12:55 -0800 Subject: [gpfsug-discuss] ILM and Backup Question In-Reply-To: <562DE7B5.7080303@fz-juelich.de> References: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> <562DE7B5.7080303@fz-juelich.de> Message-ID: <201510262114.t9QLENpG024083@d01av01.pok.ibm.com> > From: Stephan Graf > > For backup we use mmbackup (dsmc) > for the user HOME directory (no ILM) > #120 mio files => 3 hours get candidate list + x hour backup That seems rather slow. What version of GPFS are you running? How many nodes are you using? Are you using a "-g global shared directory"? The original mmapplypolicy code was targeted to a single node, so by default it still runs on a single node and you have to specify -N to run it in parallel. When you run multi-node there is a "-g" option that defines a global shared directory that must be visible to all nodes specified in the -N list. Using "-g" with "-N" enables a scale-out parallel algorithm that substantially reduces the time for candidate selection. -Wayne -------------- next part -------------- An HTML attachment was scrubbed... URL: From wsawdon at us.ibm.com Mon Oct 26 22:22:58 2015 From: wsawdon at us.ibm.com (Wayne Sawdon) Date: Mon, 26 Oct 2015 14:22:58 -0800 Subject: [gpfsug-discuss] ILM and Backup Question In-Reply-To: References: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> Message-ID: <201510262224.t9QMOwRO006986@d03av03.boulder.ibm.com> > From: "Simon Thompson (Research Computing - IT Services)" > > Does anyone know how TSM backups handle XATTRs? TSM capture XATTRs and ACLs in an opaque "blob" using gpfs_fgetattrs. Unfortunately, TSM stores the opaque blob with the file data. Changes to the blob require the data to be backed up again. > Or even other attributes like immutability, Immutable files may be backed up and restored as immutable files. Immutability is restored after the data has been restored. > can you do a synthetic recreate of the TSM HSM tape from backups? TSM stores data from backups and data from HSM in different pools. A file that is both HSM'ed and backed up will have at least two copies of data off-line. I suspect that losing a tape from the HSM pool will have no effect on the backup pool, but you should verify that with someone from TSM. -Wayne -------------- next part -------------- An HTML attachment was scrubbed... URL: From st.graf at fz-juelich.de Tue Oct 27 07:03:19 2015 From: st.graf at fz-juelich.de (Stephan Graf) Date: Tue, 27 Oct 2015 08:03:19 +0100 Subject: [gpfsug-discuss] ILM and Backup Question In-Reply-To: <201510262114.t9QLENpG024083@d01av01.pok.ibm.com> References: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> <562DE7B5.7080303@fz-juelich.de> <201510262114.t9QLENpG024083@d01av01.pok.ibm.com> Message-ID: <562F21B7.8040007@fz-juelich.de> We are running the mmbackup on an AIX system oslevel -s 6100-07-10-1415 Current GPFS build: "4.1.0.8 ". So we only use one node for the policy run. Stephan On 10/26/15 22:12, Wayne Sawdon wrote: > From: Stephan Graf > > For backup we use mmbackup (dsmc) > for the user HOME directory (no ILM) > #120 mio files => 3 hours get candidate list + x hour backup That seems rather slow. What version of GPFS are you running? How many nodes are you using? Are you using a "-g global shared directory"? The original mmapplypolicy code was targeted to a single node, so by default it still runs on a single node and you have to specify -N to run it in parallel. When you run multi-node there is a "-g" option that defines a global shared directory that must be visible to all nodes specified in the -N list. Using "-g" with "-N" enables a scale-out parallel algorithm that substantially reduces the time for candidate selection. -Wayne _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ -------------- next part -------------- An HTML attachment was scrubbed... URL: From kraemerf at de.ibm.com Tue Oct 27 09:02:52 2015 From: kraemerf at de.ibm.com (Frank Kraemer) Date: Tue, 27 Oct 2015 10:02:52 +0100 Subject: [gpfsug-discuss] Spectrum Scale v4.2 In-Reply-To: References: Message-ID: <201510270904.t9R940k4019623@d06av11.portsmouth.uk.ibm.com> see "IBM Spectrum Scale V4.2 delivers simple, efficient,and intelligent data management for highperformance,scale-out storage" http://www.ibm.com/common/ssi/rep_ca/8/897/ENUS215-398/ENUS215-398.PDF Frank Kraemer IBM Consulting IT Specialist / Client Technical Architect Hechtsheimer Str. 2, 55131 Mainz mailto:kraemerf at de.ibm.com voice: +49171-3043699 IBM Germany -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Tue Oct 27 10:47:43 2015 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Tue, 27 Oct 2015 10:47:43 +0000 Subject: [gpfsug-discuss] ILM and Backup Question In-Reply-To: <201510262224.t9QMOwRO006986@d03av03.boulder.ibm.com> References: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> <201510262224.t9QMOwRO006986@d03av03.boulder.ibm.com> Message-ID: <1445942863.17909.89.camel@buzzard.phy.strath.ac.uk> On Mon, 2015-10-26 at 14:22 -0800, Wayne Sawdon wrote: [SNIP] > > > > can you do a synthetic recreate of the TSM HSM tape from backups? > > TSM stores data from backups and data from HSM in different pools. A > file that is both HSM'ed and backed up will have at least two copies > of data off-line. I suspect that losing a tape from the HSM pool will > have no effect on the backup pool, but you should verify that with > someone from TSM. > I am pretty sure that you have to restore the files first from backup, and it is a manual process. Least it was for me when a HSM tape went bad in the past. Had to use TSM to generate a list of the files on the HSM tape, and then feed that in to a dsmc restore, before doing a reconcile and removing the tape from the library for destruction. Finally all the files where punted back to tape. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From wsawdon at us.ibm.com Tue Oct 27 15:25:02 2015 From: wsawdon at us.ibm.com (Wayne Sawdon) Date: Tue, 27 Oct 2015 07:25:02 -0800 Subject: [gpfsug-discuss] ILM and Backup Question In-Reply-To: <562F21B7.8040007@fz-juelich.de> References: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> <562DE7B5.7080303@fz-juelich.de> <201510262114.t9QLENpG024083@d01av01.pok.ibm.com> <562F21B7.8040007@fz-juelich.de> Message-ID: <201510271526.t9RFQ2Bw027971@d03av02.boulder.ibm.com> > From: Stephan Graf > We are running the mmbackup on an AIX system > oslevel -s > 6100-07-10-1415 > Current GPFS build: "4.1.0.8 ". > > So we only use one node for the policy run. > Even on one node you should see a speedup using -g and -N. -Wayne -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Oct 27 17:28:00 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Tue, 27 Oct 2015 17:28:00 +0000 Subject: [gpfsug-discuss] Quotas, replication and hsm Message-ID: Hi, If we have replication enabled on a file, does the size from ls -l or du return the actual file size, or the replicated file size (I.e. Twice the actual size)?. >From experimentation, it appears to be double the actual size, I.e. Taking into account replication of 2. This appears to mean that quotas have to be double what we actually want to take account of the replication factor. Is this correct? Second part of the question. If a file is transferred to tape (or compressed maybe as well), does the file still count against quota, and how much for? As on hsm tape its no longer copies=2. Same for a compressed file, does the compressed file count as the original or compressed size against quota? I.e. Could a user accessing a compressed file suddenly go over quota by accessing the file? Thanks Simon From Robert.Oesterlin at nuance.com Tue Oct 27 19:48:04 2015 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 27 Oct 2015 19:48:04 +0000 Subject: [gpfsug-discuss] Spectrum Scale 4.2 - upgrade and ongoing PTFs Message-ID: <4E539EE4-596B-441C-9E60-46072E567765@nuance.com> With Spectrum Scale 4.2 announced, can anyone from IBM comment on what the outlook/process is for fixes and PTFs? When 4.1.1 came out, 4.1.0.X more or less dies, with 4.1.0.8 being the last level ? yes? Then move to 4.1.1 With 4.1.1 ? we are now at 4.1.1-2 and 4.2 is going to GA on 11/20/2015 Is the plan to ?encourage? the upgrade to 4.2, meaning if you want fixes and are at 4.1.1-x, you move to 4.2, or will IBM continue to PTF the 4.1.1 stream for the foreseeable future? Bob Oesterlin Sr Storage Engineer, Nuance Communications 507-269-0413 -------------- next part -------------- An HTML attachment was scrubbed... URL: From st.graf at fz-juelich.de Wed Oct 28 08:06:01 2015 From: st.graf at fz-juelich.de (Stephan Graf) Date: Wed, 28 Oct 2015 09:06:01 +0100 Subject: [gpfsug-discuss] ILM and Backup Question In-Reply-To: <201510271526.t9RFQ2Bw027971@d03av02.boulder.ibm.com> References: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> <562DE7B5.7080303@fz-juelich.de> <201510262114.t9QLENpG024083@d01av01.pok.ibm.com> <562F21B7.8040007@fz-juelich.de> <201510271526.t9RFQ2Bw027971@d03av02.boulder.ibm.com> Message-ID: <563081E9.2090605@fz-juelich.de> Hi Wayne! We are using -g, and we only want to run it on one node, so we don't use the -N option. Stephan On 10/27/15 16:25, Wayne Sawdon wrote: > From: Stephan Graf > We are running the mmbackup on an AIX system > oslevel -s > 6100-07-10-1415 > Current GPFS build: "4.1.0.8 ". > > So we only use one node for the policy run. > Even on one node you should see a speedup using -g and -N. -Wayne _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ -------------- next part -------------- An HTML attachment was scrubbed... URL: From Dan.Foster at bristol.ac.uk Wed Oct 28 10:06:10 2015 From: Dan.Foster at bristol.ac.uk (Dan Foster) Date: Wed, 28 Oct 2015 10:06:10 +0000 Subject: [gpfsug-discuss] Quotas, replication and hsm In-Reply-To: References: Message-ID: On 27 October 2015 at 17:28, Simon Thompson (Research Computing - IT Services) wrote: > Hi, > > If we have replication enabled on a file, does the size from ls -l or du return the actual file size, or the replicated file size (I.e. Twice the actual size)?. > > From experimentation, it appears to be double the actual size, I.e. Taking into account replication of 2. > > This appears to mean that quotas have to be double what we actually want to take account of the replication factor. > > Is this correct? This is what we obverse here by default and currently have to double our fileset quotas to take this is to account on replicated filesystems. You've reminded me that I was going to ask this list if it's possible to report the un-replicated sizes? While the quota management is only a slight pain, what's reported to the user is more of a problem for us(e.g. via SMB share / df ). We're considering replicating a lot more of our filesystems and it would be useful if it didn't appear that everyones quotas had just doubled overnight. Thanks, Dan. -- Dan Foster | Senior Storage Systems Administrator | IT Services From duersch at us.ibm.com Wed Oct 28 12:47:52 2015 From: duersch at us.ibm.com (Steve Duersch) Date: Wed, 28 Oct 2015 08:47:52 -0400 Subject: [gpfsug-discuss] Spectrum Scale 4.2 - upgrade and ongoing PTFs Message-ID: >>Is the plan to ?encourage? the upgrade to 4.2, meaning if you want fixes and are at 4.1.1-x, you move to 4.2, or will IBM continue to PTF the 4.1.1 stream for the foreseeable future? IBM will continue to create PTFs for the 4.1.1 stream. Steve Duersch Spectrum Scale (GPFS) FVTest IBM Poughkeepsie, New York -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Oct 28 13:06:56 2015 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 28 Oct 2015 13:06:56 +0000 Subject: [gpfsug-discuss] Spectrum Scale 4.2 - upgrade and ongoing PTFs In-Reply-To: References: Message-ID: Hi Steve Thanks ? that?s puzzling (surprising?) given that 4.1.1 hasn?t really been out that long. (less than 6 months) I?m in a position of deciding of what my upgrade path and timeline should be. If I?m at 4.1.0.X and want to upgrade all my clusters, the ?safer? bet is probably 4.1.1-X. but all the new features are going to end up on the 4.2.X. If 4.2 is going to GA in November, perhaps it?s better to wait for the first 4.2 PTF package. Bob Oesterlin Sr Storage Engineer, Nuance Communications 507-269-0413 From: > on behalf of Steve Duersch > Reply-To: gpfsug main discussion list > Date: Wednesday, October 28, 2015 at 7:47 AM To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] Spectrum Scale 4.2 - upgrade and ongoing PTFs IBM will continue to create PTFs for the 4.1.1 stream. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Wed Oct 28 13:09:52 2015 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 28 Oct 2015 13:09:52 +0000 Subject: [gpfsug-discuss] Spectrum Scale 4.2 - upgrade and ongoing PTFs In-Reply-To: References: Message-ID: <6AB4198E-DE7C-4F5D-9C3A-0067C85D1AE0@vanderbilt.edu> All, What about the 4.1.0-x stream? We?re on 4.1.0-8 and will soon be applying an efix to it to take care of the snapshot deletion and ?quotas are wrong? bugs. We?ve also go no immediate plans to go to either 4.1.1-x or 4.2 until they?ve had a chance to ? mature. It?s not that big of a deal - I don?t mind running on the efix for a while. Just curious. Thanks? Kevin On Oct 28, 2015, at 7:47 AM, Steve Duersch > wrote: >>Is the plan to ?encourage? the upgrade to 4.2, meaning if you want fixes and are at 4.1.1-x, you move to 4.2, or will IBM continue to PTF the 4.1.1 stream for the foreseeable future? IBM will continue to create PTFs for the 4.1.1 stream. Steve Duersch Spectrum Scale (GPFS) FVTest IBM Poughkeepsie, New York _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Wed Oct 28 13:15:30 2015 From: bbanister at jumptrading.com (Bryan Banister) Date: Wed, 28 Oct 2015 13:15:30 +0000 Subject: [gpfsug-discuss] Spectrum Scale 4.2 - upgrade and ongoing PTFs In-Reply-To: <6AB4198E-DE7C-4F5D-9C3A-0067C85D1AE0@vanderbilt.edu> References: <6AB4198E-DE7C-4F5D-9C3A-0067C85D1AE0@vanderbilt.edu> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB05CF6CF0@CHI-EXCHANGEW1.w2k.jumptrading.com> IBM has stated that there will no longer be PTF releases for 4.1.0, and that 4.1.0-8 is the last PTF release. Thus you?ll have to choose between upgrading to 4.1.1 (which has the latest GPFS Protocols feature, hence the numbering change), or wait and go with the 4.2 release. I heard rumor from somebody at IBM (honestly can?t remember who) that the first 3 releases of any major release has some additional debugging turned up, which is turned off after on the fourth PTF release and those going forward. Does anybody at IBM want to confirm or deny this rumor? I?m also leery of going with the first major release of GPFS (or any software, like RHEL 7.0 for instance). Thanks, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: Wednesday, October 28, 2015 8:10 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Spectrum Scale 4.2 - upgrade and ongoing PTFs All, What about the 4.1.0-x stream? We?re on 4.1.0-8 and will soon be applying an efix to it to take care of the snapshot deletion and ?quotas are wrong? bugs. We?ve also go no immediate plans to go to either 4.1.1-x or 4.2 until they?ve had a chance to ? mature. It?s not that big of a deal - I don?t mind running on the efix for a while. Just curious. Thanks? Kevin On Oct 28, 2015, at 7:47 AM, Steve Duersch > wrote: >>Is the plan to ?encourage? the upgrade to 4.2, meaning if you want fixes and are at 4.1.1-x, you move to 4.2, or will IBM continue to PTF the 4.1.1 stream for the foreseeable future? IBM will continue to create PTFs for the 4.1.1 stream. Steve Duersch Spectrum Scale (GPFS) FVTest IBM Poughkeepsie, New York _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Wed Oct 28 13:25:27 2015 From: bbanister at jumptrading.com (Bryan Banister) Date: Wed, 28 Oct 2015 13:25:27 +0000 Subject: [gpfsug-discuss] Quotas, replication and hsm In-Reply-To: References: Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB05CF6E05@CHI-EXCHANGEW1.w2k.jumptrading.com> I'm not sure what kind of report you're looking for, but the `du` command has a "--apparent-size" option that has this description: print apparent sizes, rather than disk usage; although the apparent size is usually smaller, it may be larger due to holes in (?sparse?) files, internal fragmentation, indirect blocks, and the like This can be used to get the actual amount of space that files are using. I think that mmrepquota and mmlsquota show twice the amount of space of the actual file due to the replication, but somebody correct me if I'm mistaken. I also would like to know what the output of the ILM "LIST" policy reports for KB_ALLOCATED for replicated files. Is it the replicated amount of data? Thanks, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Dan Foster Sent: Wednesday, October 28, 2015 5:06 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Quotas, replication and hsm On 27 October 2015 at 17:28, Simon Thompson (Research Computing - IT Services) wrote: > Hi, > > If we have replication enabled on a file, does the size from ls -l or du return the actual file size, or the replicated file size (I.e. Twice the actual size)?. > > From experimentation, it appears to be double the actual size, I.e. Taking into account replication of 2. > > This appears to mean that quotas have to be double what we actually want to take account of the replication factor. > > Is this correct? This is what we obverse here by default and currently have to double our fileset quotas to take this is to account on replicated filesystems. You've reminded me that I was going to ask this list if it's possible to report the un-replicated sizes? While the quota management is only a slight pain, what's reported to the user is more of a problem for us(e.g. via SMB share / df ). We're considering replicating a lot more of our filesystems and it would be useful if it didn't appear that everyones quotas had just doubled overnight. Thanks, Dan. -- Dan Foster | Senior Storage Systems Administrator | IT Services _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. From wsawdon at us.ibm.com Wed Oct 28 13:36:27 2015 From: wsawdon at us.ibm.com (Wayne Sawdon) Date: Wed, 28 Oct 2015 05:36:27 -0800 Subject: [gpfsug-discuss] ILM and Backup Question In-Reply-To: <563081E9.2090605@fz-juelich.de> References: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> <562DE7B5.7080303@fz-juelich.de> <201510262114.t9QLENpG024083@d01av01.pok.ibm.com> <562F21B7.8040007@fz-juelich.de> <201510271526.t9RFQ2Bw027971@d03av02.boulder.ibm.com> <563081E9.2090605@fz-juelich.de> Message-ID: <201510281336.t9SDaiNa015723@d01av01.pok.ibm.com> You have to use both options even if -N is only the local node. Sorry, -Wayne From: Stephan Graf To: Date: 10/28/2015 01:06 AM Subject: Re: [gpfsug-discuss] ILM and Backup Question Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Wayne! We are using -g, and we only want to run it on one node, so we don't use the -N option. Stephan On 10/27/15 16:25, Wayne Sawdon wrote: > From: Stephan Graf > We are running the mmbackup on an AIX system > oslevel -s > 6100-07-10-1415 > Current GPFS build: "4.1.0.8 ". > > So we only use one node for the policy run. > Even on one node you should see a speedup using -g and -N. -Wayne _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From wsawdon at us.ibm.com Wed Oct 28 14:11:25 2015 From: wsawdon at us.ibm.com (Wayne Sawdon) Date: Wed, 28 Oct 2015 06:11:25 -0800 Subject: [gpfsug-discuss] Quotas, replication and hsm In-Reply-To: References: Message-ID: <201510281412.t9SEChQo030691@d01av03.pok.ibm.com> > From: "Simon Thompson (Research Computing - IT Services)" > > > Second part of the question. If a file is transferred to tape (or > compressed maybe as well), does the file still count against quota, > and how much for? As on hsm tape its no longer copies=2. Same for a > compressed file, does the compressed file count as the original or > compressed size against quota? I.e. Could a user accessing a > compressed file suddenly go over quota by accessing the file? > Quotas account for space in the file system. If you migrate a user's file to tape, then that user is credited for the space saved. If a later access recalls the file then the user is again charged for the space. Note that HSM recall is done as "root" which bypasses the quota check -- this allows the file to be recalled even if it pushes the user past his quota limit. Compression (which is currently in beta) has the same properties. If you compress a file, then the user is credited with the space saved. When the file is uncompressed the user is again charged. Since uncompression is done by the "user" the quota check is enforced and uncompression can fail. This includes writes to a compressed file. > From: Bryan Banister > > I also would like to know what the output of the ILM "LIST" policy > reports for KB_ALLOCATED for replicated files. Is it the replicated > amount of data? > KB_ALLOCATED shows the same value that stat shows, So yes it shows the replicated amount of data actually used by the file. -Wayne -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Wed Oct 28 14:48:11 2015 From: makaplan at us.ibm.com (makaplan at us.ibm.com) Date: Wed, 28 Oct 2015 09:48:11 -0500 Subject: [gpfsug-discuss] ILM and Backup Question In-Reply-To: <201510281336.t9SDaiNa015723@d01av01.pok.ibm.com> References: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> <562DE7B5.7080303@fz-juelich.de> <201510262114.t9QLENpG024083@d01av01.pok.ibm.com> <562F21B7.8040007@fz-juelich.de> <201510271526.t9RFQ2Bw027971@d03av02.boulder.ibm.com> <563081E9.2090605@fz-juelich.de> <201510281336.t9SDaiNa015723@d01av01.pok.ibm.com> Message-ID: <201510281448.t9SEmFsr030044@d01av02.pok.ibm.com> IF you see one or more status messages like this: [I] %2$s Parallel-piped sort and policy evaluation. %1$llu files scanned. %3$s Then you are getting the (potentially) fastest version of the GPFS inode and policy scanning algorithm. You may also want to adjust the -a and -A options of the mmapplypolicy command, as mentioned in the command documentation. Oh I see the documentation for -A is wrong in many versions of the manual. There is an attempt to automagically estimate the proper number of buckets, based on the inodes allocated count. If you want to investigate performance more I recommend you use our debug option -d 7 or set environment variable MM_POLICY_DEBUG_BITS=7 - this will show you how the work is divided among the nodes and threads. --marc of GPFS -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Thu Oct 29 14:14:58 2015 From: knop at us.ibm.com (Felipe Knop) Date: Thu, 29 Oct 2015 09:14:58 -0500 Subject: [gpfsug-discuss] Intro (new member) Message-ID: Hi, I have just joined the GPFS (Spectrum Scale) UG list. I work in the GPFS development team. I had the chance of attending the "Inaugural USA Meet the Devs" session in New York City on Oct 7, which was a valuable opportunity to hear from customers using the product. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carlz at us.ibm.com Fri Oct 30 15:14:50 2015 From: carlz at us.ibm.com (Carl Zetie) Date: Fri, 30 Oct 2015 10:14:50 -0500 Subject: [gpfsug-discuss] Making an RFE Public (and an intro) Message-ID: <201510301520.t9UFKTUP032501@d01av04.pok.ibm.com> First the intro: I am the new Product Manager joining the Spectrum Scale team, taking the place of Janet Ellsworth. I'm looking forward to meeting with you all. I also have some news about RFEs: we are working to enable you to choose whether your RFEs for Scale are private or public. I know that many of you have requested public RFEs so that other people can see and vote on RFEs. We'd like to see that too as it's very valuable information for us (as well as reducing duplicates). So here's what we're doing: Short term: If you have an existing RFE that you would like to see made Public, please email me with the ID of the RFE. You can find my email address at the foot of this message. PLEASE don't email the entire list! Medium term: We are working to allow you to choose at the time of submission whether a request will be Private or Public. Unfortunately for technical internal reasons we can't simply make the Public / Private field selectable at submission time (don't ask!), so instead we are creating two submission queues, one for Private RFEs and another for public RFEs. So when you submit an RFE in future you'll start by selecting the appropriate queue. Inside IBM, they all go into the same evaluation process. As soon as I have an update on the availability of this fix, I will share with the group. Note that even for Public requests, some fields remain Private and hidden from other viewers, e.g. Business Case (look for the "key" icon next to the field to confirm). regards, Carl Carl Zetie Product Manager for Spectrum Scale, IBM (540) 882 9353 ][ 15750 Brookhill Ct, Waterford VA 20197 carlz at us.ibm.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfhamano at us.ibm.com Fri Oct 30 15:29:58 2015 From: jfhamano at us.ibm.com (John Hamano) Date: Fri, 30 Oct 2015 07:29:58 -0800 Subject: [gpfsug-discuss] Making an RFE Public (and an intro) In-Reply-To: <201510301520.t9UFKTUP032501@d01av04.pok.ibm.com> References: <201510301520.t9UFKTUP032501@d01av04.pok.ibm.com> Message-ID: <201510301530.t9UFUM0M004729@d03av05.boulder.ibm.com> Hi Carl, welcome and congratulations on your new role. I am North America Brand Sales for ESS and Spectrum Scale. Let me know when you have some time next weekg to talk. From: Carl Zetie/Fairfax/IBM at IBMUS To: gpfsug-discuss at spectrumscale.org, Date: 10/30/2015 08:20 AM Subject: [gpfsug-discuss] Making an RFE Public (and an intro) Sent by: gpfsug-discuss-bounces at spectrumscale.org First the intro: I am the new Product Manager joining the Spectrum Scale team, taking the place of Janet Ellsworth. I'm looking forward to meeting with you all. I also have some news about RFEs: we are working to enable you to choose whether your RFEs for Scale are private or public. I know that many of you have requested public RFEs so that other people can see and vote on RFEs. We'd like to see that too as it's very valuable information for us (as well as reducing duplicates). So here's what we're doing: Short term: If you have an existing RFE that you would like to see made Public, please email me with the ID of the RFE. You can find my email address at the foot of this message. PLEASE don't email the entire list! Medium term: We are working to allow you to choose at the time of submission whether a request will be Private or Public. Unfortunately for technical internal reasons we can't simply make the Public / Private field selectable at submission time (don't ask!), so instead we are creating two submission queues, one for Private RFEs and another for public RFEs. So when you submit an RFE in future you'll start by selecting the appropriate queue. Inside IBM, they all go into the same evaluation process. As soon as I have an update on the availability of this fix, I will share with the group. Note that even for Public requests, some fields remain Private and hidden from other viewers, e.g. Business Case (look for the "key" icon next to the field to confirm). regards, Carl Carl Zetie Product Manager for Spectrum Scale, IBM (540) 882 9353 ][ 15750 Brookhill Ct, Waterford VA 20197 carlz at us.ibm.com_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From PATBYRNE at uk.ibm.com Thu Oct 1 11:09:29 2015 From: PATBYRNE at uk.ibm.com (Patrick Byrne) Date: Thu, 1 Oct 2015 10:09:29 +0000 Subject: [gpfsug-discuss] Problem Determination Message-ID: <201510011010.t91AASm6029240@d06av08.portsmouth.uk.ibm.com> An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu Oct 1 13:39:25 2015 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 1 Oct 2015 12:39:25 +0000 Subject: [gpfsug-discuss] Problem Determination In-Reply-To: <201510011010.t91AASm6029240@d06av08.portsmouth.uk.ibm.com> References: <201510011010.t91AASm6029240@d06av08.portsmouth.uk.ibm.com> Message-ID: Hi Patrick I was going to mail you directly ? but this may help spark some discussion in this area. GPFS (pardon the use of the ?old school" term ? You need something easier to type that Spectrum Scale) problem determination is one of those areas that is (sometimes) more of an art than a science. IBM publishes a PD guide, and it?s a good start but doesn?t cover all the bases. - In the GPFS log (/var/mmfs/gen/mmfslog) there are a lot of messages generated. I continue to come across ones that are not documented ? or documented poorly. EVERYTHING that ends up in ANY log needs to be documented. - The PD guide gives some basic things to look at for many of the error messages, but doesn?t go into alternative explanation for many errors. Example: When a node gets expelled, the PD guide tells you it?s a communication issue, when it fact in may be related to other things like Linux network tuning. Covering all the possible causes is hard, but you can improve this. - GPFS waiter information ? understanding and analyzing this is key to getting to the bottom of many problems. The waiter information is not well documented. You should include at least a basic guide on how to use waiter information in determining cluster problems. Related: Undocumented config options. You can come across some by doing ?mmdiag ?config?. Using some of these can help you ? or get you in trouble in the long run. If I can see the option, document it. - Make sure that all information I might come across online is accurate, especially on those sites managed by IBM. The Developerworks wiki has great information, but there is a lot of information out there that?s out of date or inaccurate. This leads to confusion. - The automatic deadlock detection implemented in 4.1 can be useful, but it also can be problematic in a large cluster when you get into problems. Firing off traces and taking dumps in an automated manner can cause more problems if you have a large cluster. I ended up turning it off. - GPFS doesn?t have anything setup to alert you when conditions occur that may require your attention. There are some alerting capabilities that you can customize, but something out of the box might be useful. I know there is work going on in this area. mmces ? I did some early testing on this but haven?t had a chance to upgrade my protocol nodes to the new level. Upgrading 1000?s of node across many cluster is ? challenging :-) The newer commands are a great start. I like the ability to list out events related to a particular protocol. I could go on? Feel free to contact me directly for a more detailed discussion: robert.oesterlin @ nuance.com Bob Oesterlin Sr Storage Engineer, Nuance Communications From: > on behalf of Patrick Byrne Reply-To: gpfsug main discussion list Date: Thursday, October 1, 2015 at 5:09 AM To: "gpfsug-discuss at gpfsug.org" Subject: [gpfsug-discuss] Problem Determination Hi all, As I'm sure some of you aware, problem determination is an area that we are looking to try and make significant improvements to over the coming releases of Spectrum Scale. To help us target the areas we work to improve and make it as useful as possible I am trying to get as much feedback as I can about different problems users have, and how people go about solving them. I am interested in hearing everything from day to day annoyances to problems that have caused major frustration in trying to track down the root cause. Where possible it would be great to hear how the problems were dealt with as well, so that others can benefit from your experience. Feel free to reply to the mailing list - maybe others have seen similar problems and could provide tips for the future - or to me directly if you'd prefer (patbyrne at uk.ibm.com). On a related note, in 4.1.1 there was a component added that monitors the state of the various protocols that are now supported (NFS, SMB, Object). The output from this is available with the 'mmces state' and 'mmces events' CLIs and I would like to get feedback from anyone who has had the chance make use of this. Is it useful? How could it be improved? We are looking at the possibility of extending this component to cover more than just protocols, so any feedback would be greatly appreciated. Thanks in advance, Patrick Byrne IBM Spectrum Scale - Development Engineer IBM Systems - Manchester Lab IBM UK Limited -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Fri Oct 2 17:44:24 2015 From: bbanister at jumptrading.com (Bryan Banister) Date: Fri, 2 Oct 2015 16:44:24 +0000 Subject: [gpfsug-discuss] Problem Determination In-Reply-To: References: <201510011010.t91AASm6029240@d06av08.portsmouth.uk.ibm.com> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB05C8CE44@CHI-EXCHANGEW1.w2k.jumptrading.com> I would like to strongly echo what Bob has stated, especially the documentation or wrong documentation, and I have in-lining some comments below. I liken GPFS to a critical care patient at the hospital. You have to check on the state regularly, know the running heart rate (e.g. waiters), the response of every component from disk, to networks, to server load, etc. When a problem occurs, running tests (such as nsdperf) to help isolate the problem quickly is crucial. Capturing GPFS trace data is also very important if the problem isn?t obvious. But then you have to wait for IBM support to parse the information and give you their analysis of the situation. It would be great to get an advanced troubleshooting document that describes how to read the output of `mmfsadm dump` commands and the GPFS trace report that is generated. Cheers, -Bryan From: gpfsug-discuss-bounces at gpfsug.org [mailto:gpfsug-discuss-bounces at gpfsug.org] On Behalf Of Oesterlin, Robert Sent: Thursday, October 01, 2015 7:39 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Problem Determination Hi Patrick I was going to mail you directly ? but this may help spark some discussion in this area. GPFS (pardon the use of the ?old school" term ? You need something easier to type that Spectrum Scale) problem determination is one of those areas that is (sometimes) more of an art than a science. IBM publishes a PD guide, and it?s a good start but doesn?t cover all the bases. - In the GPFS log (/var/mmfs/gen/mmfslog) there are a lot of messages generated. I continue to come across ones that are not documented ? or documented poorly. EVERYTHING that ends up in ANY log needs to be documented. - The PD guide gives some basic things to look at for many of the error messages, but doesn?t go into alternative explanation for many errors. Example: When a node gets expelled, the PD guide tells you it?s a communication issue, when it fact in may be related to other things like Linux network tuning. Covering all the possible causes is hard, but you can improve this. - GPFS waiter information ? understanding and analyzing this is key to getting to the bottom of many problems. The waiter information is not well documented. You should include at least a basic guide on how to use waiter information in determining cluster problems. Related: Undocumented config options. You can come across some by doing ?mmdiag ?config?. Using some of these can help you ? or get you in trouble in the long run. If I can see the option, document it. [Bryan: Also please, please provide a way to check whether or not the configuration parameters need to be changed. I assume that there is a `mmfsadm dump` command that can tell you whether the config parameter needs to be changed, if not make one! Just stating something like ?This could be increased to XX value for very large clusters? is not very helpful. - Make sure that all information I might come across online is accurate, especially on those sites managed by IBM. The Developerworks wiki has great information, but there is a lot of information out there that?s out of date or inaccurate. This leads to confusion. [Bryan: I know that Scott Fadden is a busy man, so I would recommend helping distribute the workload of maintaining the wiki documentation. This data should be reviewed on a more regular basis, at least once for each major release I would hope, and updated or deleted if found to be out of date.] - The automatic deadlock detection implemented in 4.1 can be useful, but it also can be problematic in a large cluster when you get into problems. Firing off traces and taking dumps in an automated manner can cause more problems if you have a large cluster. I ended up turning it off. [Bryan: From what I?ve heard, IBM is actively working to make the deadlock amelioration logic better. I agree that firing off traces can cause more problems, and we have turned off the automated collection as well. We are going to work on enabling the collection of some data during these events to help ensure we get enough data for IBM to analyze the problem.] - GPFS doesn?t have anything setup to alert you when conditions occur that may require your attention. There are some alerting capabilities that you can customize, but something out of the box might be useful. I know there is work going on in this area. [Bryan: The GPFS callback facilities are very useful for setting up alerts, but not well documented or advertised by the GPFS manuals. I hope to see more callback capabilities added to help monitor all aspects of the GPFS cluster and file systems] mmces ? I did some early testing on this but haven?t had a chance to upgrade my protocol nodes to the new level. Upgrading 1000?s of node across many cluster is ? challenging :-) The newer commands are a great start. I like the ability to list out events related to a particular protocol. I could go on? Feel free to contact me directly for a more detailed discussion: robert.oesterlin @ nuance.com Bob Oesterlin Sr Storage Engineer, Nuance Communications From: > on behalf of Patrick Byrne Reply-To: gpfsug main discussion list Date: Thursday, October 1, 2015 at 5:09 AM To: "gpfsug-discuss at gpfsug.org" Subject: [gpfsug-discuss] Problem Determination Hi all, As I'm sure some of you aware, problem determination is an area that we are looking to try and make significant improvements to over the coming releases of Spectrum Scale. To help us target the areas we work to improve and make it as useful as possible I am trying to get as much feedback as I can about different problems users have, and how people go about solving them. I am interested in hearing everything from day to day annoyances to problems that have caused major frustration in trying to track down the root cause. Where possible it would be great to hear how the problems were dealt with as well, so that others can benefit from your experience. Feel free to reply to the mailing list - maybe others have seen similar problems and could provide tips for the future - or to me directly if you'd prefer (patbyrne at uk.ibm.com). On a related note, in 4.1.1 there was a component added that monitors the state of the various protocols that are now supported (NFS, SMB, Object). The output from this is available with the 'mmces state' and 'mmces events' CLIs and I would like to get feedback from anyone who has had the chance make use of this. Is it useful? How could it be improved? We are looking at the possibility of extending this component to cover more than just protocols, so any feedback would be greatly appreciated. Thanks in advance, Patrick Byrne IBM Spectrum Scale - Development Engineer IBM Systems - Manchester Lab IBM UK Limited ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Oct 2 17:58:41 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 2 Oct 2015 16:58:41 +0000 Subject: [gpfsug-discuss] Problem Determination In-Reply-To: References: <201510011010.t91AASm6029240@d06av08.portsmouth.uk.ibm.com>, Message-ID: I agree on docs, particularly on mmdiag, I think things like --lroc are not documented. I'm also not sure that --network always gives accurate network stats. (we were doing some ha failure testing where we have split site in and fabrics, yet the network counters didn't change even when the local ib nsd servers were shut down). It would be nice also to have a set of Icinga/Nagios plugins from IBM, maybe in samples whcich are updated on each release with new feature checks. And not problem determination, but id really like to see an inflight non disruptive upgrade path. Particularly as we run vms off gpfs, its bot always practical or possible to move vms, so would be nice to have upgrade in flight (not suggesting this would be a quick thing to implement). Simon ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Oesterlin, Robert [Robert.Oesterlin at nuance.com] Sent: 01 October 2015 13:39 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Problem Determination Hi Patrick I was going to mail you directly ? but this may help spark some discussion in this area. GPFS (pardon the use of the ?old school" term ? You need something easier to type that Spectrum Scale) problem determination is one of those areas that is (sometimes) more of an art than a science. IBM publishes a PD guide, and it?s a good start but doesn?t cover all the bases. - In the GPFS log (/var/mmfs/gen/mmfslog) there are a lot of messages generated. I continue to come across ones that are not documented ? or documented poorly. EVERYTHING that ends up in ANY log needs to be documented. - The PD guide gives some basic things to look at for many of the error messages, but doesn?t go into alternative explanation for many errors. Example: When a node gets expelled, the PD guide tells you it?s a communication issue, when it fact in may be related to other things like Linux network tuning. Covering all the possible causes is hard, but you can improve this. - GPFS waiter information ? understanding and analyzing this is key to getting to the bottom of many problems. The waiter information is not well documented. You should include at least a basic guide on how to use waiter information in determining cluster problems. Related: Undocumented config options. You can come across some by doing ?mmdiag ?config?. Using some of these can help you ? or get you in trouble in the long run. If I can see the option, document it. - Make sure that all information I might come across online is accurate, especially on those sites managed by IBM. The Developerworks wiki has great information, but there is a lot of information out there that?s out of date or inaccurate. This leads to confusion. - The automatic deadlock detection implemented in 4.1 can be useful, but it also can be problematic in a large cluster when you get into problems. Firing off traces and taking dumps in an automated manner can cause more problems if you have a large cluster. I ended up turning it off. - GPFS doesn?t have anything setup to alert you when conditions occur that may require your attention. There are some alerting capabilities that you can customize, but something out of the box might be useful. I know there is work going on in this area. mmces ? I did some early testing on this but haven?t had a chance to upgrade my protocol nodes to the new level. Upgrading 1000?s of node across many cluster is ? challenging :-) The newer commands are a great start. I like the ability to list out events related to a particular protocol. I could go on? Feel free to contact me directly for a more detailed discussion: robert.oesterlin @ nuance.com Bob Oesterlin Sr Storage Engineer, Nuance Communications From: > on behalf of Patrick Byrne Reply-To: gpfsug main discussion list Date: Thursday, October 1, 2015 at 5:09 AM To: "gpfsug-discuss at gpfsug.org" Subject: [gpfsug-discuss] Problem Determination Hi all, As I'm sure some of you aware, problem determination is an area that we are looking to try and make significant improvements to over the coming releases of Spectrum Scale. To help us target the areas we work to improve and make it as useful as possible I am trying to get as much feedback as I can about different problems users have, and how people go about solving them. I am interested in hearing everything from day to day annoyances to problems that have caused major frustration in trying to track down the root cause. Where possible it would be great to hear how the problems were dealt with as well, so that others can benefit from your experience. Feel free to reply to the mailing list - maybe others have seen similar problems and could provide tips for the future - or to me directly if you'd prefer (patbyrne at uk.ibm.com). On a related note, in 4.1.1 there was a component added that monitors the state of the various protocols that are now supported (NFS, SMB, Object). The output from this is available with the 'mmces state' and 'mmces events' CLIs and I would like to get feedback from anyone who has had the chance make use of this. Is it useful? How could it be improved? We are looking at the possibility of extending this component to cover more than just protocols, so any feedback would be greatly appreciated. Thanks in advance, Patrick Byrne IBM Spectrum Scale - Development Engineer IBM Systems - Manchester Lab IBM UK Limited From ewahl at osc.edu Fri Oct 2 19:00:46 2015 From: ewahl at osc.edu (Wahl, Edward) Date: Fri, 2 Oct 2015 18:00:46 +0000 Subject: [gpfsug-discuss] Problem Determination In-Reply-To: <201510011010.t91AASm6029240@d06av08.portsmouth.uk.ibm.com> References: <201510011010.t91AASm6029240@d06av08.portsmouth.uk.ibm.com> Message-ID: <9DA9EC7A281AC7428A9618AFDC49049955AEB4DF@CIO-KRC-D1MBX02.osuad.osu.edu> I'm not yet in the 4.x release stream so this may be taken with a grain (or more) of salt as we say. PLEASE keep the ability of commands to set -x or dump debug when the env DEBUG=1 is set. This has been extremely useful over the years. Granted I've never worked out why sometimes we see odd little things like machines deciding they suddenly need an FPO license or one nsd server suddenly decides it's name is part of the FQDN instead of just it's hostname and only for certain commands, but it's DAMN useful. Minor issues especially can be tracked down with it. Undocumented features and logged items abound. I'd say start there. This is one area where it is definitely more art than science with Spectrum Scale (meh GPFS still sounds better. So does Shark. Can we go back to calling it the Shark Server Project?) Complete failure of the verbs layer and fallback to other defined networks would be nice to know about during operation. It's excellent about telling you at startup but not so much during operation, at least in 3.5. I imagine with the 'automated compatibility layer building' I'll be looking for some serious amounts of PD for the issues we _will_ see there. We frequently build against kernels we are not yet running at this site, so this needs well documented PD and resolution. Ed Wahl OSC ________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Patrick Byrne [PATBYRNE at uk.ibm.com] Sent: Thursday, October 01, 2015 6:09 AM To: gpfsug-discuss at gpfsug.org Subject: [gpfsug-discuss] Problem Determination Hi all, As I'm sure some of you aware, problem determination is an area that we are looking to try and make significant improvements to over the coming releases of Spectrum Scale. To help us target the areas we work to improve and make it as useful as possible I am trying to get as much feedback as I can about different problems users have, and how people go about solving them. I am interested in hearing everything from day to day annoyances to problems that have caused major frustration in trying to track down the root cause. Where possible it would be great to hear how the problems were dealt with as well, so that others can benefit from your experience. Feel free to reply to the mailing list - maybe others have seen similar problems and could provide tips for the future - or to me directly if you'd prefer (patbyrne at uk.ibm.com). On a related note, in 4.1.1 there was a component added that monitors the state of the various protocols that are now supported (NFS, SMB, Object). The output from this is available with the 'mmces state' and 'mmces events' CLIs and I would like to get feedback from anyone who has had the chance make use of this. Is it useful? How could it be improved? We are looking at the possibility of extending this component to cover more than just protocols, so any feedback would be greatly appreciated. Thanks in advance, Patrick Byrne IBM Spectrum Scale - Development Engineer IBM Systems - Manchester Lab IBM UK Limited -------------- next part -------------- An HTML attachment was scrubbed... URL: From zgiles at gmail.com Fri Oct 2 21:27:17 2015 From: zgiles at gmail.com (Zachary Giles) Date: Fri, 2 Oct 2015 16:27:17 -0400 Subject: [gpfsug-discuss] Problem Determination In-Reply-To: <9DA9EC7A281AC7428A9618AFDC49049955AEB4DF@CIO-KRC-D1MBX02.osuad.osu.edu> References: <201510011010.t91AASm6029240@d06av08.portsmouth.uk.ibm.com> <9DA9EC7A281AC7428A9618AFDC49049955AEB4DF@CIO-KRC-D1MBX02.osuad.osu.edu> Message-ID: I would like to see better performance metrics / counters from GPFS. I know we already have mmpmon, which is generally really good -- I've done some fun things with it and it has been a great tool. And, I realize that there is supposedly a new monitoring framework in 4.x.. which I haven't played with yet. But, Generally it would be extremely helpful to get synchronized (across all nodes) high accuracy counters of data flow, number of waiters, page pool stats, distribution of data from one layer to another down to NSDs.. etc etc etc. I believe many of these counters already exist, but they're hidden in some mmfsadm xx command that one needs to troll through with possible performance implications. mmpmon can do some of this, but it's only a handful of counters, it's hard to say how synchronized the counters are across nodes, and I've personally seen an mmpmon run go bad and take down a cluster. It would be nice if it were pushed out, or provided in a safe manner with the design and expectation of "log-everything forever continuously". As GSS/ESS systems start popping up, I realize they have this other monitoring framework to watch the VD throughputs.. which is great. But, that doesn't allow us to monitor more traditional types. Would be nice to monitor it all together the same way so we don't miss-out on monitoring half the infrastructure or buying a cluster with some fancy GUI that can't do what we want.. -Zach On Fri, Oct 2, 2015 at 2:00 PM, Wahl, Edward wrote: > I'm not yet in the 4.x release stream so this may be taken with a grain (or > more) of salt as we say. > > PLEASE keep the ability of commands to set -x or dump debug when the env > DEBUG=1 is set. This has been extremely useful over the years. Granted > I've never worked out why sometimes we see odd little things like machines > deciding they suddenly need an FPO license or one nsd server suddenly > decides it's name is part of the FQDN instead of just it's hostname and only > for certain commands, but it's DAMN useful. Minor issues especially can be > tracked down with it. > > Undocumented features and logged items abound. I'd say start there. This > is one area where it is definitely more art than science with Spectrum Scale > (meh GPFS still sounds better. So does Shark. Can we go back to calling it > the Shark Server Project?) > > Complete failure of the verbs layer and fallback to other defined networks > would be nice to know about during operation. It's excellent about telling > you at startup but not so much during operation, at least in 3.5. > > I imagine with the 'automated compatibility layer building' I'll be looking > for some serious amounts of PD for the issues we _will_ see there. We > frequently build against kernels we are not yet running at this site, so > this needs well documented PD and resolution. > > Ed Wahl > OSC > > > ________________________________ > From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] > on behalf of Patrick Byrne [PATBYRNE at uk.ibm.com] > Sent: Thursday, October 01, 2015 6:09 AM > To: gpfsug-discuss at gpfsug.org > Subject: [gpfsug-discuss] Problem Determination > > Hi all, > > As I'm sure some of you aware, problem determination is an area that we are > looking to try and make significant improvements to over the coming releases > of Spectrum Scale. To help us target the areas we work to improve and make > it as useful as possible I am trying to get as much feedback as I can about > different problems users have, and how people go about solving them. > > I am interested in hearing everything from day to day annoyances to problems > that have caused major frustration in trying to track down the root cause. > Where possible it would be great to hear how the problems were dealt with as > well, so that others can benefit from your experience. Feel free to reply to > the mailing list - maybe others have seen similar problems and could provide > tips for the future - or to me directly if you'd prefer > (patbyrne at uk.ibm.com). > > On a related note, in 4.1.1 there was a component added that monitors the > state of the various protocols that are now supported (NFS, SMB, Object). > The output from this is available with the 'mmces state' and 'mmces events' > CLIs and I would like to get feedback from anyone who has had the chance > make use of this. Is it useful? How could it be improved? We are looking at > the possibility of extending this component to cover more than just > protocols, so any feedback would be greatly appreciated. > > Thanks in advance, > > Patrick Byrne > IBM Spectrum Scale - Development Engineer > IBM Systems - Manchester Lab > IBM UK Limited > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Zach Giles zgiles at gmail.com From Luke.Raimbach at crick.ac.uk Mon Oct 5 13:57:14 2015 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Mon, 5 Oct 2015 12:57:14 +0000 Subject: [gpfsug-discuss] Independent Inode Space Limit Message-ID: Hi All, When creating an independent inode space, I see the valid range for the number of inodes is between 1024 and 4294967294. Is the ~4.2billion upper limit something that can be increased in the future? I also see that the first 1024 inodes are immediately allocated upon creation. I assume these are allocated to internal data structures and are a copy of a subset of the first 4038 inodes allocated for new file systems? It would be useful to know if these internal structures are fixed for independent filesets and if they are not, what factors determine their layout (for performance purposes). Many Thanks, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. From usa-principal at gpfsug.org Mon Oct 5 14:55:15 2015 From: usa-principal at gpfsug.org (usa-principal-gpfsug.org) Date: Mon, 05 Oct 2015 09:55:15 -0400 Subject: [gpfsug-discuss] Final Reminder: Inaugural US "Meet the Developers" Message-ID: <9656d0110c2be4b339ec5ce662409b8e@webmail.gpfsug.org> A last reminder to check in with Janet if you have not done so already. Looking forward to this event on Wednesday this week. Best, Kristy --- Hello Everyone, Here is a reminder about our inaugural US "Meet the Developers" session. Details are below, and please send an e-mail to Janet Ellsworth (janetell at us.ibm.com) by next Friday September 18th if you wish to attend. Janet is on the product management team for Spectrum Scale and is helping with the logistics for this first event. Date: Wednesday, October 7th Place: IBM building at 590 Madison Avenue, New York City Time: 12:30 to 5 PM (Lunch will be served at 12:30, and sessions will start between 1 and 1:30 PM. Afternoon snacks will be served as well :-) Agenda IBM development architect to present the new protocols support that was released with Spectrum Scale 4.1.1 in June. IBM developer to demo future Graphical User Interface ***Member of user community to present an experience with using Spectrum Scale (still seeking volunteers for this !)*** Open Q&A with the development team We are happy to have heard from many of you so far who would like to attend. We still have room however, so please get in touch by the 9/18 date if you would like to attend. ***We also need someone to share an experience or use case scenario with Spectrum Scale for this event, so please let Janet know if you are willing to do that too.*** As you have likely seen, we are also working on the agenda and timing for day-long GPFS US UG event in Austin during November aligned with SC15 and there will be more details on that coming soon. From secretary at gpfsug.org Wed Oct 7 12:50:51 2015 From: secretary at gpfsug.org (Secretary GPFS UG) Date: Wed, 07 Oct 2015 12:50:51 +0100 Subject: [gpfsug-discuss] Places available: Meet the Devs Message-ID: <813d82bd5074b90c3a67acc85a03995b@webmail.gpfsug.org> Hi All, There are still places available for the next 'Meet the Devs' event in Edinburgh on Friday 23rd October from 10:30/11am until 3/3:30pm. It's a great opportunity for you to meet with developers and talk through specific issues as well as learn more from the experts. Location: Room 2009a, Information Services, James Clerk Maxwell Building, Peter Guthrie Tait Road, Edinburgh EH9 3FD Google maps link: https://goo.gl/maps/Ta7DQ Agenda: - GUI - 4.2 Updates/show and tell - Open conversation on any areas of interest attendees may have Lunch and refreshments will be provided. Please email me (secretary at gpfsug.org) if you would like to attend including any particular topics of interest you would like to discuss. Best wishes, -- Claire O'Toole GPFS User Group Secretary +44 (0)7508 033896 www.gpfsug.org From service at metamodul.com Wed Oct 7 16:06:56 2015 From: service at metamodul.com (service at metamodul.com) Date: Wed, 07 Oct 2015 17:06:56 +0200 Subject: [gpfsug-discuss] Places available: Meet the Devs Message-ID: Hi Claire, I will attend the meeting. Hans-Joachim Ehlers MetaModul GmbH Germany Cheers Hajo Von Samsung Mobile gesendet
-------- Urspr?ngliche Nachricht --------
Von: Secretary GPFS UG
Datum:2015.10.07 13:50 (GMT+01:00)
An: gpfsug main discussion list
Betreff: [gpfsug-discuss] Places available: Meet the Devs
Hi All, There are still places available for the next 'Meet the Devs' event in Edinburgh on Friday 23rd October from 10:30/11am until 3/3:30pm. It's a great opportunity for you to meet with developers and talk through specific issues as well as learn more from the experts. Location: Room 2009a, Information Services, James Clerk Maxwell Building, Peter Guthrie Tait Road, Edinburgh EH9 3FD Google maps link: https://goo.gl/maps/Ta7DQ Agenda: - GUI - 4.2 Updates/show and tell - Open conversation on any areas of interest attendees may have Lunch and refreshments will be provided. Please email me (secretary at gpfsug.org) if you would like to attend including any particular topics of interest you would like to discuss. Best wishes, -- Claire O'Toole GPFS User Group Secretary +44 (0)7508 033896 www.gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Douglas.Hughes at DEShawResearch.com Wed Oct 7 19:59:26 2015 From: Douglas.Hughes at DEShawResearch.com (Hughes, Doug) Date: Wed, 7 Oct 2015 18:59:26 +0000 Subject: [gpfsug-discuss] new member, first post Message-ID: sitting here in the US GPFS UG meeting in NYC and just found out about this list. We've been a GPFS user for many years, first with integrated DDN support, but now also with a GSS system. we have about 4PB of raw GPFS storage and 1 billion inodes. We keep our metadata on TMS ramsan for very fast policy execution for tiering and migration. We use GPFS to hold the primary source data from our custom supercomputers. We have many policies executed periodically for managing the data, including writing certain files to dedicated fast pools and then migrating the data off to wide swaths of disk for read access from cluster clients. One pain point, which I'm sure many of the rest of you have seen, restripe operations for just metadata are unnecessarily slow. If we experience a flash module failure and need to restripe, it also has to check all of the data. I have a feature request open to make metadata restripes only look at metadata (since it is on RamSan/FlashCache, this should be very fast) instead of scanning everything, which can and does take months with performance impacts. Doug Hughes D. E. Shaw Research, LLC. Sent from my android device. -------------- next part -------------- An HTML attachment was scrubbed... URL: From chair at gpfsug.org Thu Oct 8 20:37:05 2015 From: chair at gpfsug.org (GPFS UG Chair (Simon Thompson)) Date: Thu, 08 Oct 2015 20:37:05 +0100 Subject: [gpfsug-discuss] User group update Message-ID: Hi, I thought I'd drop an update to the group on various admin things which have been going on behind the scenes. The first US meet the devs event was held yesterday, and I'm hoping someone who went will be preparing a blog post to cover the event a little. I know a bunch of people have joined the mailing list since then, so welcome to the group to all of those! ** User Group Engagement with IBM ** I also met with Akhtar yesterday who is the IBM VP for Technical Computing Developments (which includes Spectrum Scale). He was in the UK for a few days at the IBM Manchester Labs, so we managed to squeeze a meeting to talk a bit about the UG. I'm very pleased that Akhtar confirmed IBMs commitment to help the user group in both the UK and USA with developer support for the meet the devs and annual group meetings. I'd like to extend my thanks to those at IBM who are actively supporting the group in so many ways. One idea we have been mulling over is filming the talks at next year's events and then putting those on Youtube for people who can't get there. IBM have given us tentative agreement to do this, subject to a few conditions. Most importantly that the UG and IBM ensure we don't publish customer or IBM items which are NDA/not for general public consumption. I'm hopeful we can get this all approved and if we do, we'll be looking to the community to help us out (anyone got digital camera equipment we might be able to borrow, or some help with editing down afterwards?) Whilst in Manchester I also met with Patrick to talk over the various emails people have sent in about problem determination, which Patrick will be taking to the dev meeting in a few weeks. It sounds like there are some interesting ideas kicking about, so hopefully we'll get some value from the user group input. Some of the new features in 4.2 were also demo'd and for those who might not have been to a meet the devs session and are interested in the upcoming GUI, it is now in public beta, head over to developer works for more details: https://www.ibm.com/developerworks/community/forums/html/topic?id=4dc34bf1- 17d1-4dc0-af72-6dc5a3f93e82&ps=25 ** User Group Feedback ** Over the past few months, I've also been collecting feedback from people, either comments on the mailing list, or those who I've spoken to, which was all collated and sent in to IBM, we'll hopefully be getting some feedback on that in the next few weeks - there's a bunch of preliminary answers now, but a few places we still need a bit of clarification. There's also some longer term discussion going on about GPFS and cloud (in particular to those of us in scientific areas). We'll feed that back as and when we get responses we can share. We'd like to ensure that we gather as much feedback from users so that we can collectively take it to IBM, so please do continue to post comments etc to the mailing list. ** Diary Dates ** A few dates for diaries: * Meet the Devs in Edinburgh - Friday 23rd October 2015 * GPFS UG Meeting @ SC15 in Austin, USA - Sunday 15th November 2015 * GPFS UG Meeting @ Computing Insight UK, Coventry, UK - Tuesday 8th December 2015 (Note you must be registered also for CIUK) * GPFS UG Meeting May 2015 - IBM South Bank, London, UK- 17th/18th May 2016 ** User Group Admin ** Within the committee, we've been talking about how we can extend the reach of the group, so we may be reaching out to a few group members to take this forward. Of course if anyone has suggestions on how we can ensure we reach as many people as possible, please let me know, either via the mailing list of directly by email. I know there are lot of people on the mailing list who don't post (regularly), so I'd be interested to hear if you find the group mailing list discussion useful, if you feel there are barriers to asking questions, or what you'd like to see coming out of the user group - please feel free to email me directly if you'd like to comment on any of this! We've also registered spectrumscale.org to point to the user group, so you may start to see the group marketed as the Spectrum Scale User Group, but rest assured, its still the same old GPFS User Group ;-) Just a reminder that we made the mailing list so that only members can post. This was to reduce the amount of spam coming in and being held for moderation (and a few legit posts got lost this way). If you do want to post, but not receive the emails, you can set this as an option in the mailing list software. Finally, I've also fixed the mailing list archives, so these are now available at: http://www.gpfsug.org/pipermail/gpfsug-discuss/ Simon GPFS UG, UK Chair From L.A.Hurst at bham.ac.uk Fri Oct 9 09:25:52 2015 From: L.A.Hurst at bham.ac.uk (Laurence Alexander Hurst (IT Services)) Date: Fri, 9 Oct 2015 08:25:52 +0000 Subject: [gpfsug-discuss] User group update Message-ID: On 08/10/2015 20:37, "gpfsug-discuss-bounces at gpfsug.org on behalf of GPFS UG Chair (Simon Thompson)" wrote: >GPFS UG Meeting May 2015 - IBM South Bank, London, UK- 17th/18th May >2016 Daft question: is that 17th *and* 18th or 17th *or* 18th (presumably TBC)? Thanks, Laurence -- Laurence Hurst Research Support, IT Services, University of Birmingham From S.J.Thompson at bham.ac.uk Fri Oct 9 10:00:11 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 9 Oct 2015 09:00:11 +0000 Subject: [gpfsug-discuss] User group update In-Reply-To: References: Message-ID: Both days. May 2016 is a two day event. Simon ________________________________________ From: gpfsug-discuss-bounces at gpfsug.org [gpfsug-discuss-bounces at gpfsug.org] on behalf of Laurence Alexander Hurst (IT Services) [L.A.Hurst at bham.ac.uk] Sent: 09 October 2015 09:25 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] User group update On 08/10/2015 20:37, "gpfsug-discuss-bounces at gpfsug.org on behalf of GPFS UG Chair (Simon Thompson)" wrote: >GPFS UG Meeting May 2015 - IBM South Bank, London, UK- 17th/18th May >2016 Daft question: is that 17th *and* 18th or 17th *or* 18th (presumably TBC)? Thanks, Laurence -- Laurence Hurst Research Support, IT Services, University of Birmingham _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From S.J.Thompson at bham.ac.uk Sat Oct 10 14:54:22 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Sat, 10 Oct 2015 13:54:22 +0000 Subject: [gpfsug-discuss] User group update Message-ID: > >We've also registered spectrumscale.org to point to the user group, so you >may start to see the group marketed as the Spectrum Scale User Group, but >rest assured, its still the same old GPFS User Group ;-) And this is just a test mail to ensure that mail to gpfsug-discuss at spectrumscale.org gets through OK. The old address should also still work. Simon From S.J.Thompson at bham.ac.uk Sat Oct 10 14:55:55 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Sat, 10 Oct 2015 13:55:55 +0000 Subject: [gpfsug-discuss] User group update In-Reply-To: References: Message-ID: On 10/10/2015 14:54, "Simon Thompson (Research Computing - IT Services)" wrote: >> >>We've also registered spectrumscale.org to point to the user group, so >>you >>may start to see the group marketed as the Spectrum Scale User Group, but >>rest assured, its still the same old GPFS User Group ;-) > >And this is just a test mail to ensure that mail to >gpfsug-discuss at spectrumscale.org gets through OK. The old address should >also still work. And checking the old address still works fine as well. Simon From Robert.Oesterlin at nuance.com Tue Oct 13 03:03:45 2015 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 13 Oct 2015 02:03:45 +0000 Subject: [gpfsug-discuss] User group Meeting at SC15 - Registration Message-ID: We?d like to have all those attending the user group meeting at SC15 to register ? details are below. Thanks to IBM for getting the space and arranging all the details. I?ll post a more detailed agenda soon. Looking forward to meeting everyone! Location: JW Marriott 110 E 2nd Street Austin, Texas United States Date and Time: Sunday Nov 15, 1:00 PM?5:30 PM Agenda: - Latest IBM Spectrum Scale enhancements - Future directions and roadmap* (NDA required) - Newer usecases and User presentations Registration: Please register at the below link to book your seat. https://www-950.ibm.com/events/wwe/grp/grp017.nsf/v17_agenda?openform&seminar=99QNTNES&locale=en_US&S_TACT=sales Bob Oesterlin Sr Storage Engineer, Nuance Communications 507-269-0413 -------------- next part -------------- An HTML attachment was scrubbed... URL: From chair at spectrumscale.org Sat Oct 17 20:51:50 2015 From: chair at spectrumscale.org (GPFS UG Chair (Simon Thompson)) Date: Sat, 17 Oct 2015 20:51:50 +0100 Subject: [gpfsug-discuss] Blog on USA Meet the Devs Message-ID: Hi All, Kirsty wrote a blog post on the inaugural meet the devs in the USA. You can find it here: http://www.spectrumscale.org/inaugural-usa-meet-the-devs/ Thanks to Kristy, Bob and Pallavi for organising, the IBM devs and the group members giving talks. Simon From Tomasz.Wolski at ts.fujitsu.com Wed Oct 21 15:23:54 2015 From: Tomasz.Wolski at ts.fujitsu.com (Wolski, Tomasz) Date: Wed, 21 Oct 2015 16:23:54 +0200 Subject: [gpfsug-discuss] Intro Message-ID: Hi All, My name is Tomasz Wolski and I?m development engineer at Fujitsu Technology Solutions in Lodz, Poland. We?ve been using GPFS in our main product, which is ETERNUS CS8000, for many years now. GPFS helps us to build a consolidation of backup and archiving solutions for our end customers. We make use of GPFS snapshots, NIFS/CIFS services, GPFS API for our internal components and many many more .. :) My main responsibility, except developing new features for our system, is integration new GPFS versions into our system and bug tracking GPFS issues. Best regards, Tomasz Wolski -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Oct 23 15:04:49 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Fri, 23 Oct 2015 14:04:49 +0000 Subject: [gpfsug-discuss] Independent Inode Space Limit Message-ID: >When creating an independent inode space, I see the valid range for the >number of inodes is between 1024 and 4294967294. > >Is the ~4.2billion upper limit something that can be increased in the >future? > >I also see that the first 1024 inodes are immediately allocated upon >creation. I assume these are allocated to internal data structures and >are a copy of a subset of the first 4038 inodes allocated for new file >systems? It would be useful to know if these internal structures are >fixed for independent filesets and if they are not, what factors >determine their layout (for performance purposes). Anyone have any thoughts on this? Anyone from IBM know? Thanks Simon From sfadden at us.ibm.com Fri Oct 23 13:42:14 2015 From: sfadden at us.ibm.com (Scott Fadden) Date: Fri, 23 Oct 2015 07:42:14 -0500 Subject: [gpfsug-discuss] Independent Inode Space Limit In-Reply-To: References: Message-ID: <201510231442.t9NEgt6b026687@d03av05.boulder.ibm.com> GPFS limits the max inodes based on metadata space. Add more metadata space and you should be able to add more inodes. Scott Fadden Spectrum Scale - Technical Marketing Phone: (503) 880-5833 sfadden at us.ibm.com http://www.ibm.com/systems/storage/spectrum/scale From: "Simon Thompson (Research Computing - IT Services)" To: gpfsug main discussion list Date: 10/23/2015 09:05 AM Subject: Re: [gpfsug-discuss] Independent Inode Space Limit Sent by: gpfsug-discuss-bounces at spectrumscale.org >When creating an independent inode space, I see the valid range for the >number of inodes is between 1024 and 4294967294. > >Is the ~4.2billion upper limit something that can be increased in the >future? > >I also see that the first 1024 inodes are immediately allocated upon >creation. I assume these are allocated to internal data structures and >are a copy of a subset of the first 4038 inodes allocated for new file >systems? It would be useful to know if these internal structures are >fixed for independent filesets and if they are not, what factors >determine their layout (for performance purposes). Anyone have any thoughts on this? Anyone from IBM know? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From sfadden at us.ibm.com Fri Oct 23 13:42:14 2015 From: sfadden at us.ibm.com (Scott Fadden) Date: Fri, 23 Oct 2015 07:42:14 -0500 Subject: [gpfsug-discuss] Independent Inode Space Limit In-Reply-To: References: Message-ID: <201510231442.t9NEgQ0M024262@d01av05.pok.ibm.com> GPFS limits the max inodes based on metadata space. Add more metadata space and you should be able to add more inodes. Scott Fadden Spectrum Scale - Technical Marketing Phone: (503) 880-5833 sfadden at us.ibm.com http://www.ibm.com/systems/storage/spectrum/scale From: "Simon Thompson (Research Computing - IT Services)" To: gpfsug main discussion list Date: 10/23/2015 09:05 AM Subject: Re: [gpfsug-discuss] Independent Inode Space Limit Sent by: gpfsug-discuss-bounces at spectrumscale.org >When creating an independent inode space, I see the valid range for the >number of inodes is between 1024 and 4294967294. > >Is the ~4.2billion upper limit something that can be increased in the >future? > >I also see that the first 1024 inodes are immediately allocated upon >creation. I assume these are allocated to internal data structures and >are a copy of a subset of the first 4038 inodes allocated for new file >systems? It would be useful to know if these internal structures are >fixed for independent filesets and if they are not, what factors >determine their layout (for performance purposes). Anyone have any thoughts on this? Anyone from IBM know? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From wsawdon at us.ibm.com Fri Oct 23 16:25:33 2015 From: wsawdon at us.ibm.com (Wayne Sawdon) Date: Fri, 23 Oct 2015 08:25:33 -0700 Subject: [gpfsug-discuss] Independent Inode Space Limit In-Reply-To: <201510231442.t9NEgt6b026687@d03av05.boulder.ibm.com> References: <201510231442.t9NEgt6b026687@d03av05.boulder.ibm.com> Message-ID: <201510231525.t9NFPr1G010768@d03av04.boulder.ibm.com> >I also see that the first 1024 inodes are immediately allocated upon >creation. I assume these are allocated to internal data structures and >are a copy of a subset of the first 4038 inodes allocated for new file >systems? It would be useful to know if these internal structures are >fixed for independent filesets and if they are not, what factors >determine their layout (for performance purposes). Independent filesets don't have the internal structures that the file system has. Other than the fileset's root directory all of the remaining inodes can be allocated to user files. Inodes are always allocated in full metadata blocks. The inodes for an independent fileset are allocated in their own blocks. This makes fileset snapshots more efficient, since a copy-on-write of the block of inodes will only copy inodes in the fileset. The inode blocks for all filesets are in the same inode file, but the blocks for each independent fileset are strided, making them easy to prefetch for policy scans. -Wayne -------------- next part -------------- An HTML attachment was scrubbed... URL: From wsawdon at us.ibm.com Fri Oct 23 16:25:33 2015 From: wsawdon at us.ibm.com (Wayne Sawdon) Date: Fri, 23 Oct 2015 08:25:33 -0700 Subject: [gpfsug-discuss] Independent Inode Space Limit In-Reply-To: <201510231442.t9NEgt6b026687@d03av05.boulder.ibm.com> References: <201510231442.t9NEgt6b026687@d03av05.boulder.ibm.com> Message-ID: <201510231525.t9NFPv9P004320@d01av03.pok.ibm.com> >I also see that the first 1024 inodes are immediately allocated upon >creation. I assume these are allocated to internal data structures and >are a copy of a subset of the first 4038 inodes allocated for new file >systems? It would be useful to know if these internal structures are >fixed for independent filesets and if they are not, what factors >determine their layout (for performance purposes). Independent filesets don't have the internal structures that the file system has. Other than the fileset's root directory all of the remaining inodes can be allocated to user files. Inodes are always allocated in full metadata blocks. The inodes for an independent fileset are allocated in their own blocks. This makes fileset snapshots more efficient, since a copy-on-write of the block of inodes will only copy inodes in the fileset. The inode blocks for all filesets are in the same inode file, but the blocks for each independent fileset are strided, making them easy to prefetch for policy scans. -Wayne -------------- next part -------------- An HTML attachment was scrubbed... URL: From kallbac at iu.edu Mon Oct 26 02:38:52 2015 From: kallbac at iu.edu (Kallback-Rose, Kristy A) Date: Sun, 25 Oct 2015 22:38:52 -0400 Subject: [gpfsug-discuss] ILM and Backup Question Message-ID: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> Simon wrote recently in the GPFS UG Blog: "We also got into discussion on backup and ILM, and I think its amazing how everyone does these things in their own slightly different way. I think this might be an interesting area for discussion over on the group mailing list. There's a lot of options and different ways to do things!? Yes, please! I?m *very* interested in what others are doing. We (IU) are currently doing a POC with GHI for DR backups (GHI=GPFS HPSS Integration?we have had HPSS for a very long time), but I?m interested what others are doing with either ILM or other methods to brew their own backup solutions, how much they are backing up and with what regularity, what resources it takes, etc. If you have anything going on at your site that?s relevant, can you please share? Thanks, Kristy Kristy Kallback-Rose Manager, Research Storage Indiana University -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 495 bytes Desc: Message signed with OpenPGP using GPGMail URL: From st.graf at fz-juelich.de Mon Oct 26 08:43:33 2015 From: st.graf at fz-juelich.de (Stephan Graf) Date: Mon, 26 Oct 2015 09:43:33 +0100 Subject: [gpfsug-discuss] ILM and Backup Question In-Reply-To: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> References: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> Message-ID: <562DE7B5.7080303@fz-juelich.de> Hi! We at J?lich Supercomputing Centre have two ILM managed file systems (GPFS and HSM from TSM). #50 mio files + 10 PB data on tape #30 mio files + 8 PB data on tape For backup we use mmbackup (dsmc) for the user HOME directory (no ILM) #120 mio files => 3 hours get candidate list + x hour backup We use also mmbackup for the ILM managed filesystem. Policy: the file must be backed up first before migrated to tape 2-3 hour for candidate list + x hours/days/weeks backups (!!!) -> a metadata change (e.g. renaming a directory by the user) enforces a new backup of the files which causes a very expensive tape inline copy! Greetings from J?lich, Germany Stephan On 10/26/15 03:38, Kallback-Rose, Kristy A wrote: Simon wrote recently in the GPFS UG Blog: "We also got into discussion on backup and ILM, and I think its amazing how everyone does these things in their own slightly different way. I think this might be an interesting area for discussion over on the group mailing list. There's a lot of options and different ways to do things!? Yes, please! I?m *very* interested in what others are doing. We (IU) are currently doing a POC with GHI for DR backups (GHI=GPFS HPSS Integration?we have had HPSS for a very long time), but I?m interested what others are doing with either ILM or other methods to brew their own backup solutions, how much they are backing up and with what regularity, what resources it takes, etc. If you have anything going on at your site that?s relevant, can you please share? Thanks, Kristy Kristy Kallback-Rose Manager, Research Storage Indiana University _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ -------------- next part -------------- An HTML attachment was scrubbed... URL: From Douglas.Hughes at DEShawResearch.com Mon Oct 26 13:42:47 2015 From: Douglas.Hughes at DEShawResearch.com (Hughes, Doug) Date: Mon, 26 Oct 2015 13:42:47 +0000 Subject: [gpfsug-discuss] ILM and Backup Question In-Reply-To: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> References: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> Message-ID: We have all of our GPFSmetadata on FlashCache devices (nee Ramsan) and that helps a lot. We also have our data going into monotonically increasing buckets of about 30TB that we call lockers (e.g. locker100, locker101, locker102), with 1 primary active at a time. We have an hourly job that scans the most recent 2 lockers (taked about 45 seconds each) to generate a file list using the ILM 'LIST' policy of all files that have been modified or created in the last hour. That goes to a file that has all of the names which then trickles to a custom backup daemon that has up to 10 threads for rsyncing these over to our HSM server (running GPFS/TSM space management). From there things automatically get backed up and archived. Not all hourlies are necessarily complete (we can't guarantee that nobody is still hanging on to $lockernum-2 for instance), so we have a daily that scans the entire 3PB to find anything created/updated in the last 24 hours and does an rsync on that. There's no harm in duplication of hourlies from the rsync perspective because rsync takes care of that (already exists on destination). The daily job takes about 45 minutes. Needless to say it would be impossible without metadata on a fast flash device. Sent from my android device. -----Original Message----- From: "Kallback-Rose, Kristy A" To: gpfsug main discussion list Sent: Sun, 25 Oct 2015 22:39 Subject: [gpfsug-discuss] ILM and Backup Question Simon wrote recently in the GPFS UG Blog: "We also got into discussion on backup and ILM, and I think its amazing how everyone does these things in their own slightly different way. I think this might be an interesting area for discussion over on the group mailing list. There's a lot of options and different ways to do things!? Yes, please! I?m *very* interested in what others are doing. We (IU) are currently doing a POC with GHI for DR backups (GHI=GPFS HPSS Integration?we have had HPSS for a very long time), but I?m interested what others are doing with either ILM or other methods to brew their own backup solutions, how much they are backing up and with what regularity, what resources it takes, etc. If you have anything going on at your site that?s relevant, can you please share? Thanks, Kristy Kristy Kallback-Rose Manager, Research Storage Indiana University -------------- next part -------------- An HTML attachment was scrubbed... URL: From Douglas.Hughes at DEShawResearch.com Mon Oct 26 13:42:47 2015 From: Douglas.Hughes at DEShawResearch.com (Hughes, Doug) Date: Mon, 26 Oct 2015 13:42:47 +0000 Subject: [gpfsug-discuss] ILM and Backup Question In-Reply-To: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> References: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> Message-ID: We have all of our GPFSmetadata on FlashCache devices (nee Ramsan) and that helps a lot. We also have our data going into monotonically increasing buckets of about 30TB that we call lockers (e.g. locker100, locker101, locker102), with 1 primary active at a time. We have an hourly job that scans the most recent 2 lockers (taked about 45 seconds each) to generate a file list using the ILM 'LIST' policy of all files that have been modified or created in the last hour. That goes to a file that has all of the names which then trickles to a custom backup daemon that has up to 10 threads for rsyncing these over to our HSM server (running GPFS/TSM space management). From there things automatically get backed up and archived. Not all hourlies are necessarily complete (we can't guarantee that nobody is still hanging on to $lockernum-2 for instance), so we have a daily that scans the entire 3PB to find anything created/updated in the last 24 hours and does an rsync on that. There's no harm in duplication of hourlies from the rsync perspective because rsync takes care of that (already exists on destination). The daily job takes about 45 minutes. Needless to say it would be impossible without metadata on a fast flash device. Sent from my android device. -----Original Message----- From: "Kallback-Rose, Kristy A" To: gpfsug main discussion list Sent: Sun, 25 Oct 2015 22:39 Subject: [gpfsug-discuss] ILM and Backup Question Simon wrote recently in the GPFS UG Blog: "We also got into discussion on backup and ILM, and I think its amazing how everyone does these things in their own slightly different way. I think this might be an interesting area for discussion over on the group mailing list. There's a lot of options and different ways to do things!? Yes, please! I?m *very* interested in what others are doing. We (IU) are currently doing a POC with GHI for DR backups (GHI=GPFS HPSS Integration?we have had HPSS for a very long time), but I?m interested what others are doing with either ILM or other methods to brew their own backup solutions, how much they are backing up and with what regularity, what resources it takes, etc. If you have anything going on at your site that?s relevant, can you please share? Thanks, Kristy Kristy Kallback-Rose Manager, Research Storage Indiana University -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Mon Oct 26 20:15:26 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Mon, 26 Oct 2015 20:15:26 +0000 Subject: [gpfsug-discuss] ILM and Backup Question In-Reply-To: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> References: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> Message-ID: Hi Kristy, Yes thanks for picking this up. So we (UoB) have 3 GPFS environments, each with different approaches. 1. OpenStack (GPFS as infrastructure) - we don't back this up at all. Partly this is because we are still in pilot phase, and partly because we also have ~7PB CEPH over 4 sites for this project, and the longer term aim is for us to ensure data sets and important VM images are copied into the CEPH store (and then replicated to at least 1 other site). We have some challenges with this, how should we do this? We're sorta thinging about maybe going down the irods route for this, policy scan the FS maybe, add xattr onto important data, and use that to get irods to send copies into CEPH (somehow). So this would be a bit of a hybrid home-grown solution going on here. Anyone got suggestions about how to approach this? I know IBM are now an irods consortium member, so any magic coming from IBM to integrate GFPS and irods? 2. HPC. We differentiate on our HPC file-system between backed up and non backed up space. Mostly its non backed up, where we encourage users to keep scratch data sets. We provide a small(ish) home directory which is backed up with TSM to tape, and also backup applications and system configs of the system. We use a bunch of jobs to sync some configs into local git which also is stored in the backed up part of the FS, so things like switch configs, icinga config can be backed up sanely. 3. Research Data Storage. This is a large bulk data storage solution. So far its not actually that large (few hundred TB), so we take the traditional TSM back to tape approach (its also sync replicated between data centres). We're already starting to see some possible slowness on this with data ingest and we've only just launched the service. Maybe that is a cause of launching that we suddenly see high data ingest. We are also experimenting with HSM to tape, but other than that we have no other ILM policies - only two tiers of disk, SAS for metadata and NL-SAS for bulk data. I'd like to see a flash tier in there for Metadata, which would free SAS drives and so we might be more into ILM policies. We have some more testing with snapshots to do, and have some questions about recovery of HSM files if the FS is snapshotted. Anyone any experience with this with 4.1 upwards versions of GPFS? Straight TSM backup for us means we can end up with 6 copies of data - once per data centre, backup + offsite backup tape set, HSM pool + offsite copy of HSM pool. (If an HSM tape fails, how do we know what to restore from backup? Hence we make copies of the HSM tapes as well). As our backups run on TSM, it uses the policy engine and mmbackup, so we only backup changes and new files, and never backup twice from the FS. Does anyone know how TSM backups handle XATTRs? This is one of the questions that was raised at meet the devs. Or even other attributes like immutability, as unless you are in complaint mode, its possible for immutable files to be deleted in some cases. In fact this is an interesting topic, it just occurred to me, what happens if your HSM tape fails and it contained immutable files. Would it be possible to recover these files if you don't have a copy of the HSM tape? - can you do a synthetic recreate of the TSM HSM tape from backups? We typically tell users that backups are for DR purposes, but that we'll make efforts to try and restore files subject to resource availability. Is anyone using SOBAR? What is your rationale for this? I can see that at scale, there are lot of benefits to this. But how do you handle users corrupting/deleting files etc? My understanding of SOBAR is that it doesn't give you the same ability to recover versions of files, deletions etc that straight TSM backup does. (this is something I've been meaning to raise for a while here). So what do others do? Do you have similar approaches to not backing up some types of data/areas? Do you use TSM or home-grown solutions? Or even other commercial backup solutions? What are your rationales for making decisions on backup approaches? Has anyone built their own DMAPI type interface for doing these sorts of things? Snapshots only? Do you allow users to restore themselves? If you are using ILM, are you doing it with straight policy, or is TSM playing part of the game? (If people want to comment anonymously on this without committing their company on list, happy to take email to the chair@ address and forward on anonymously to the group). Simon On 26/10/2015, 02:38, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Kallback-Rose, Kristy A" wrote: >Simon wrote recently in the GPFS UG Blog: "We also got into discussion on >backup and ILM, and I think its amazing how everyone does these things in >their own slightly different way. I think this might be an interesting >area for discussion over on the group mailing list. There's a lot of >options and different ways to do things!? > >Yes, please! I?m *very* interested in what others are doing. > >We (IU) are currently doing a POC with GHI for DR backups (GHI=GPFS HPSS >Integration?we have had HPSS for a very long time), but I?m interested >what others are doing with either ILM or other methods to brew their own >backup solutions, how much they are backing up and with what regularity, >what resources it takes, etc. > >If you have anything going on at your site that?s relevant, can you >please share? > >Thanks, >Kristy > >Kristy Kallback-Rose >Manager, Research Storage >Indiana University From wsawdon at us.ibm.com Mon Oct 26 21:12:55 2015 From: wsawdon at us.ibm.com (Wayne Sawdon) Date: Mon, 26 Oct 2015 13:12:55 -0800 Subject: [gpfsug-discuss] ILM and Backup Question In-Reply-To: <562DE7B5.7080303@fz-juelich.de> References: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> <562DE7B5.7080303@fz-juelich.de> Message-ID: <201510262114.t9QLENpG024083@d01av01.pok.ibm.com> > From: Stephan Graf > > For backup we use mmbackup (dsmc) > for the user HOME directory (no ILM) > #120 mio files => 3 hours get candidate list + x hour backup That seems rather slow. What version of GPFS are you running? How many nodes are you using? Are you using a "-g global shared directory"? The original mmapplypolicy code was targeted to a single node, so by default it still runs on a single node and you have to specify -N to run it in parallel. When you run multi-node there is a "-g" option that defines a global shared directory that must be visible to all nodes specified in the -N list. Using "-g" with "-N" enables a scale-out parallel algorithm that substantially reduces the time for candidate selection. -Wayne -------------- next part -------------- An HTML attachment was scrubbed... URL: From wsawdon at us.ibm.com Mon Oct 26 22:22:58 2015 From: wsawdon at us.ibm.com (Wayne Sawdon) Date: Mon, 26 Oct 2015 14:22:58 -0800 Subject: [gpfsug-discuss] ILM and Backup Question In-Reply-To: References: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> Message-ID: <201510262224.t9QMOwRO006986@d03av03.boulder.ibm.com> > From: "Simon Thompson (Research Computing - IT Services)" > > Does anyone know how TSM backups handle XATTRs? TSM capture XATTRs and ACLs in an opaque "blob" using gpfs_fgetattrs. Unfortunately, TSM stores the opaque blob with the file data. Changes to the blob require the data to be backed up again. > Or even other attributes like immutability, Immutable files may be backed up and restored as immutable files. Immutability is restored after the data has been restored. > can you do a synthetic recreate of the TSM HSM tape from backups? TSM stores data from backups and data from HSM in different pools. A file that is both HSM'ed and backed up will have at least two copies of data off-line. I suspect that losing a tape from the HSM pool will have no effect on the backup pool, but you should verify that with someone from TSM. -Wayne -------------- next part -------------- An HTML attachment was scrubbed... URL: From st.graf at fz-juelich.de Tue Oct 27 07:03:19 2015 From: st.graf at fz-juelich.de (Stephan Graf) Date: Tue, 27 Oct 2015 08:03:19 +0100 Subject: [gpfsug-discuss] ILM and Backup Question In-Reply-To: <201510262114.t9QLENpG024083@d01av01.pok.ibm.com> References: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> <562DE7B5.7080303@fz-juelich.de> <201510262114.t9QLENpG024083@d01av01.pok.ibm.com> Message-ID: <562F21B7.8040007@fz-juelich.de> We are running the mmbackup on an AIX system oslevel -s 6100-07-10-1415 Current GPFS build: "4.1.0.8 ". So we only use one node for the policy run. Stephan On 10/26/15 22:12, Wayne Sawdon wrote: > From: Stephan Graf > > For backup we use mmbackup (dsmc) > for the user HOME directory (no ILM) > #120 mio files => 3 hours get candidate list + x hour backup That seems rather slow. What version of GPFS are you running? How many nodes are you using? Are you using a "-g global shared directory"? The original mmapplypolicy code was targeted to a single node, so by default it still runs on a single node and you have to specify -N to run it in parallel. When you run multi-node there is a "-g" option that defines a global shared directory that must be visible to all nodes specified in the -N list. Using "-g" with "-N" enables a scale-out parallel algorithm that substantially reduces the time for candidate selection. -Wayne _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ -------------- next part -------------- An HTML attachment was scrubbed... URL: From kraemerf at de.ibm.com Tue Oct 27 09:02:52 2015 From: kraemerf at de.ibm.com (Frank Kraemer) Date: Tue, 27 Oct 2015 10:02:52 +0100 Subject: [gpfsug-discuss] Spectrum Scale v4.2 In-Reply-To: References: Message-ID: <201510270904.t9R940k4019623@d06av11.portsmouth.uk.ibm.com> see "IBM Spectrum Scale V4.2 delivers simple, efficient,and intelligent data management for highperformance,scale-out storage" http://www.ibm.com/common/ssi/rep_ca/8/897/ENUS215-398/ENUS215-398.PDF Frank Kraemer IBM Consulting IT Specialist / Client Technical Architect Hechtsheimer Str. 2, 55131 Mainz mailto:kraemerf at de.ibm.com voice: +49171-3043699 IBM Germany -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Tue Oct 27 10:47:43 2015 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Tue, 27 Oct 2015 10:47:43 +0000 Subject: [gpfsug-discuss] ILM and Backup Question In-Reply-To: <201510262224.t9QMOwRO006986@d03av03.boulder.ibm.com> References: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> <201510262224.t9QMOwRO006986@d03av03.boulder.ibm.com> Message-ID: <1445942863.17909.89.camel@buzzard.phy.strath.ac.uk> On Mon, 2015-10-26 at 14:22 -0800, Wayne Sawdon wrote: [SNIP] > > > > can you do a synthetic recreate of the TSM HSM tape from backups? > > TSM stores data from backups and data from HSM in different pools. A > file that is both HSM'ed and backed up will have at least two copies > of data off-line. I suspect that losing a tape from the HSM pool will > have no effect on the backup pool, but you should verify that with > someone from TSM. > I am pretty sure that you have to restore the files first from backup, and it is a manual process. Least it was for me when a HSM tape went bad in the past. Had to use TSM to generate a list of the files on the HSM tape, and then feed that in to a dsmc restore, before doing a reconcile and removing the tape from the library for destruction. Finally all the files where punted back to tape. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From wsawdon at us.ibm.com Tue Oct 27 15:25:02 2015 From: wsawdon at us.ibm.com (Wayne Sawdon) Date: Tue, 27 Oct 2015 07:25:02 -0800 Subject: [gpfsug-discuss] ILM and Backup Question In-Reply-To: <562F21B7.8040007@fz-juelich.de> References: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> <562DE7B5.7080303@fz-juelich.de> <201510262114.t9QLENpG024083@d01av01.pok.ibm.com> <562F21B7.8040007@fz-juelich.de> Message-ID: <201510271526.t9RFQ2Bw027971@d03av02.boulder.ibm.com> > From: Stephan Graf > We are running the mmbackup on an AIX system > oslevel -s > 6100-07-10-1415 > Current GPFS build: "4.1.0.8 ". > > So we only use one node for the policy run. > Even on one node you should see a speedup using -g and -N. -Wayne -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Oct 27 17:28:00 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Tue, 27 Oct 2015 17:28:00 +0000 Subject: [gpfsug-discuss] Quotas, replication and hsm Message-ID: Hi, If we have replication enabled on a file, does the size from ls -l or du return the actual file size, or the replicated file size (I.e. Twice the actual size)?. >From experimentation, it appears to be double the actual size, I.e. Taking into account replication of 2. This appears to mean that quotas have to be double what we actually want to take account of the replication factor. Is this correct? Second part of the question. If a file is transferred to tape (or compressed maybe as well), does the file still count against quota, and how much for? As on hsm tape its no longer copies=2. Same for a compressed file, does the compressed file count as the original or compressed size against quota? I.e. Could a user accessing a compressed file suddenly go over quota by accessing the file? Thanks Simon From Robert.Oesterlin at nuance.com Tue Oct 27 19:48:04 2015 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 27 Oct 2015 19:48:04 +0000 Subject: [gpfsug-discuss] Spectrum Scale 4.2 - upgrade and ongoing PTFs Message-ID: <4E539EE4-596B-441C-9E60-46072E567765@nuance.com> With Spectrum Scale 4.2 announced, can anyone from IBM comment on what the outlook/process is for fixes and PTFs? When 4.1.1 came out, 4.1.0.X more or less dies, with 4.1.0.8 being the last level ? yes? Then move to 4.1.1 With 4.1.1 ? we are now at 4.1.1-2 and 4.2 is going to GA on 11/20/2015 Is the plan to ?encourage? the upgrade to 4.2, meaning if you want fixes and are at 4.1.1-x, you move to 4.2, or will IBM continue to PTF the 4.1.1 stream for the foreseeable future? Bob Oesterlin Sr Storage Engineer, Nuance Communications 507-269-0413 -------------- next part -------------- An HTML attachment was scrubbed... URL: From st.graf at fz-juelich.de Wed Oct 28 08:06:01 2015 From: st.graf at fz-juelich.de (Stephan Graf) Date: Wed, 28 Oct 2015 09:06:01 +0100 Subject: [gpfsug-discuss] ILM and Backup Question In-Reply-To: <201510271526.t9RFQ2Bw027971@d03av02.boulder.ibm.com> References: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> <562DE7B5.7080303@fz-juelich.de> <201510262114.t9QLENpG024083@d01av01.pok.ibm.com> <562F21B7.8040007@fz-juelich.de> <201510271526.t9RFQ2Bw027971@d03av02.boulder.ibm.com> Message-ID: <563081E9.2090605@fz-juelich.de> Hi Wayne! We are using -g, and we only want to run it on one node, so we don't use the -N option. Stephan On 10/27/15 16:25, Wayne Sawdon wrote: > From: Stephan Graf > We are running the mmbackup on an AIX system > oslevel -s > 6100-07-10-1415 > Current GPFS build: "4.1.0.8 ". > > So we only use one node for the policy run. > Even on one node you should see a speedup using -g and -N. -Wayne _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ -------------- next part -------------- An HTML attachment was scrubbed... URL: From Dan.Foster at bristol.ac.uk Wed Oct 28 10:06:10 2015 From: Dan.Foster at bristol.ac.uk (Dan Foster) Date: Wed, 28 Oct 2015 10:06:10 +0000 Subject: [gpfsug-discuss] Quotas, replication and hsm In-Reply-To: References: Message-ID: On 27 October 2015 at 17:28, Simon Thompson (Research Computing - IT Services) wrote: > Hi, > > If we have replication enabled on a file, does the size from ls -l or du return the actual file size, or the replicated file size (I.e. Twice the actual size)?. > > From experimentation, it appears to be double the actual size, I.e. Taking into account replication of 2. > > This appears to mean that quotas have to be double what we actually want to take account of the replication factor. > > Is this correct? This is what we obverse here by default and currently have to double our fileset quotas to take this is to account on replicated filesystems. You've reminded me that I was going to ask this list if it's possible to report the un-replicated sizes? While the quota management is only a slight pain, what's reported to the user is more of a problem for us(e.g. via SMB share / df ). We're considering replicating a lot more of our filesystems and it would be useful if it didn't appear that everyones quotas had just doubled overnight. Thanks, Dan. -- Dan Foster | Senior Storage Systems Administrator | IT Services From duersch at us.ibm.com Wed Oct 28 12:47:52 2015 From: duersch at us.ibm.com (Steve Duersch) Date: Wed, 28 Oct 2015 08:47:52 -0400 Subject: [gpfsug-discuss] Spectrum Scale 4.2 - upgrade and ongoing PTFs Message-ID: >>Is the plan to ?encourage? the upgrade to 4.2, meaning if you want fixes and are at 4.1.1-x, you move to 4.2, or will IBM continue to PTF the 4.1.1 stream for the foreseeable future? IBM will continue to create PTFs for the 4.1.1 stream. Steve Duersch Spectrum Scale (GPFS) FVTest IBM Poughkeepsie, New York -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Oct 28 13:06:56 2015 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 28 Oct 2015 13:06:56 +0000 Subject: [gpfsug-discuss] Spectrum Scale 4.2 - upgrade and ongoing PTFs In-Reply-To: References: Message-ID: Hi Steve Thanks ? that?s puzzling (surprising?) given that 4.1.1 hasn?t really been out that long. (less than 6 months) I?m in a position of deciding of what my upgrade path and timeline should be. If I?m at 4.1.0.X and want to upgrade all my clusters, the ?safer? bet is probably 4.1.1-X. but all the new features are going to end up on the 4.2.X. If 4.2 is going to GA in November, perhaps it?s better to wait for the first 4.2 PTF package. Bob Oesterlin Sr Storage Engineer, Nuance Communications 507-269-0413 From: > on behalf of Steve Duersch > Reply-To: gpfsug main discussion list > Date: Wednesday, October 28, 2015 at 7:47 AM To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] Spectrum Scale 4.2 - upgrade and ongoing PTFs IBM will continue to create PTFs for the 4.1.1 stream. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Wed Oct 28 13:09:52 2015 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 28 Oct 2015 13:09:52 +0000 Subject: [gpfsug-discuss] Spectrum Scale 4.2 - upgrade and ongoing PTFs In-Reply-To: References: Message-ID: <6AB4198E-DE7C-4F5D-9C3A-0067C85D1AE0@vanderbilt.edu> All, What about the 4.1.0-x stream? We?re on 4.1.0-8 and will soon be applying an efix to it to take care of the snapshot deletion and ?quotas are wrong? bugs. We?ve also go no immediate plans to go to either 4.1.1-x or 4.2 until they?ve had a chance to ? mature. It?s not that big of a deal - I don?t mind running on the efix for a while. Just curious. Thanks? Kevin On Oct 28, 2015, at 7:47 AM, Steve Duersch > wrote: >>Is the plan to ?encourage? the upgrade to 4.2, meaning if you want fixes and are at 4.1.1-x, you move to 4.2, or will IBM continue to PTF the 4.1.1 stream for the foreseeable future? IBM will continue to create PTFs for the 4.1.1 stream. Steve Duersch Spectrum Scale (GPFS) FVTest IBM Poughkeepsie, New York _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Wed Oct 28 13:15:30 2015 From: bbanister at jumptrading.com (Bryan Banister) Date: Wed, 28 Oct 2015 13:15:30 +0000 Subject: [gpfsug-discuss] Spectrum Scale 4.2 - upgrade and ongoing PTFs In-Reply-To: <6AB4198E-DE7C-4F5D-9C3A-0067C85D1AE0@vanderbilt.edu> References: <6AB4198E-DE7C-4F5D-9C3A-0067C85D1AE0@vanderbilt.edu> Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB05CF6CF0@CHI-EXCHANGEW1.w2k.jumptrading.com> IBM has stated that there will no longer be PTF releases for 4.1.0, and that 4.1.0-8 is the last PTF release. Thus you?ll have to choose between upgrading to 4.1.1 (which has the latest GPFS Protocols feature, hence the numbering change), or wait and go with the 4.2 release. I heard rumor from somebody at IBM (honestly can?t remember who) that the first 3 releases of any major release has some additional debugging turned up, which is turned off after on the fourth PTF release and those going forward. Does anybody at IBM want to confirm or deny this rumor? I?m also leery of going with the first major release of GPFS (or any software, like RHEL 7.0 for instance). Thanks, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: Wednesday, October 28, 2015 8:10 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Spectrum Scale 4.2 - upgrade and ongoing PTFs All, What about the 4.1.0-x stream? We?re on 4.1.0-8 and will soon be applying an efix to it to take care of the snapshot deletion and ?quotas are wrong? bugs. We?ve also go no immediate plans to go to either 4.1.1-x or 4.2 until they?ve had a chance to ? mature. It?s not that big of a deal - I don?t mind running on the efix for a while. Just curious. Thanks? Kevin On Oct 28, 2015, at 7:47 AM, Steve Duersch > wrote: >>Is the plan to ?encourage? the upgrade to 4.2, meaning if you want fixes and are at 4.1.1-x, you move to 4.2, or will IBM continue to PTF the 4.1.1 stream for the foreseeable future? IBM will continue to create PTFs for the 4.1.1 stream. Steve Duersch Spectrum Scale (GPFS) FVTest IBM Poughkeepsie, New York _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Wed Oct 28 13:25:27 2015 From: bbanister at jumptrading.com (Bryan Banister) Date: Wed, 28 Oct 2015 13:25:27 +0000 Subject: [gpfsug-discuss] Quotas, replication and hsm In-Reply-To: References: Message-ID: <21BC488F0AEA2245B2C3E83FC0B33DBB05CF6E05@CHI-EXCHANGEW1.w2k.jumptrading.com> I'm not sure what kind of report you're looking for, but the `du` command has a "--apparent-size" option that has this description: print apparent sizes, rather than disk usage; although the apparent size is usually smaller, it may be larger due to holes in (?sparse?) files, internal fragmentation, indirect blocks, and the like This can be used to get the actual amount of space that files are using. I think that mmrepquota and mmlsquota show twice the amount of space of the actual file due to the replication, but somebody correct me if I'm mistaken. I also would like to know what the output of the ILM "LIST" policy reports for KB_ALLOCATED for replicated files. Is it the replicated amount of data? Thanks, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Dan Foster Sent: Wednesday, October 28, 2015 5:06 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Quotas, replication and hsm On 27 October 2015 at 17:28, Simon Thompson (Research Computing - IT Services) wrote: > Hi, > > If we have replication enabled on a file, does the size from ls -l or du return the actual file size, or the replicated file size (I.e. Twice the actual size)?. > > From experimentation, it appears to be double the actual size, I.e. Taking into account replication of 2. > > This appears to mean that quotas have to be double what we actually want to take account of the replication factor. > > Is this correct? This is what we obverse here by default and currently have to double our fileset quotas to take this is to account on replicated filesystems. You've reminded me that I was going to ask this list if it's possible to report the un-replicated sizes? While the quota management is only a slight pain, what's reported to the user is more of a problem for us(e.g. via SMB share / df ). We're considering replicating a lot more of our filesystems and it would be useful if it didn't appear that everyones quotas had just doubled overnight. Thanks, Dan. -- Dan Foster | Senior Storage Systems Administrator | IT Services _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. From wsawdon at us.ibm.com Wed Oct 28 13:36:27 2015 From: wsawdon at us.ibm.com (Wayne Sawdon) Date: Wed, 28 Oct 2015 05:36:27 -0800 Subject: [gpfsug-discuss] ILM and Backup Question In-Reply-To: <563081E9.2090605@fz-juelich.de> References: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> <562DE7B5.7080303@fz-juelich.de> <201510262114.t9QLENpG024083@d01av01.pok.ibm.com> <562F21B7.8040007@fz-juelich.de> <201510271526.t9RFQ2Bw027971@d03av02.boulder.ibm.com> <563081E9.2090605@fz-juelich.de> Message-ID: <201510281336.t9SDaiNa015723@d01av01.pok.ibm.com> You have to use both options even if -N is only the local node. Sorry, -Wayne From: Stephan Graf To: Date: 10/28/2015 01:06 AM Subject: Re: [gpfsug-discuss] ILM and Backup Question Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Wayne! We are using -g, and we only want to run it on one node, so we don't use the -N option. Stephan On 10/27/15 16:25, Wayne Sawdon wrote: > From: Stephan Graf > We are running the mmbackup on an AIX system > oslevel -s > 6100-07-10-1415 > Current GPFS build: "4.1.0.8 ". > > So we only use one node for the policy run. > Even on one node you should see a speedup using -g and -N. -Wayne _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From wsawdon at us.ibm.com Wed Oct 28 14:11:25 2015 From: wsawdon at us.ibm.com (Wayne Sawdon) Date: Wed, 28 Oct 2015 06:11:25 -0800 Subject: [gpfsug-discuss] Quotas, replication and hsm In-Reply-To: References: Message-ID: <201510281412.t9SEChQo030691@d01av03.pok.ibm.com> > From: "Simon Thompson (Research Computing - IT Services)" > > > Second part of the question. If a file is transferred to tape (or > compressed maybe as well), does the file still count against quota, > and how much for? As on hsm tape its no longer copies=2. Same for a > compressed file, does the compressed file count as the original or > compressed size against quota? I.e. Could a user accessing a > compressed file suddenly go over quota by accessing the file? > Quotas account for space in the file system. If you migrate a user's file to tape, then that user is credited for the space saved. If a later access recalls the file then the user is again charged for the space. Note that HSM recall is done as "root" which bypasses the quota check -- this allows the file to be recalled even if it pushes the user past his quota limit. Compression (which is currently in beta) has the same properties. If you compress a file, then the user is credited with the space saved. When the file is uncompressed the user is again charged. Since uncompression is done by the "user" the quota check is enforced and uncompression can fail. This includes writes to a compressed file. > From: Bryan Banister > > I also would like to know what the output of the ILM "LIST" policy > reports for KB_ALLOCATED for replicated files. Is it the replicated > amount of data? > KB_ALLOCATED shows the same value that stat shows, So yes it shows the replicated amount of data actually used by the file. -Wayne -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Wed Oct 28 14:48:11 2015 From: makaplan at us.ibm.com (makaplan at us.ibm.com) Date: Wed, 28 Oct 2015 09:48:11 -0500 Subject: [gpfsug-discuss] ILM and Backup Question In-Reply-To: <201510281336.t9SDaiNa015723@d01av01.pok.ibm.com> References: <81E9FF09-D666-4BD1-A727-39AF4ED1F54B@iu.edu> <562DE7B5.7080303@fz-juelich.de> <201510262114.t9QLENpG024083@d01av01.pok.ibm.com> <562F21B7.8040007@fz-juelich.de> <201510271526.t9RFQ2Bw027971@d03av02.boulder.ibm.com> <563081E9.2090605@fz-juelich.de> <201510281336.t9SDaiNa015723@d01av01.pok.ibm.com> Message-ID: <201510281448.t9SEmFsr030044@d01av02.pok.ibm.com> IF you see one or more status messages like this: [I] %2$s Parallel-piped sort and policy evaluation. %1$llu files scanned. %3$s Then you are getting the (potentially) fastest version of the GPFS inode and policy scanning algorithm. You may also want to adjust the -a and -A options of the mmapplypolicy command, as mentioned in the command documentation. Oh I see the documentation for -A is wrong in many versions of the manual. There is an attempt to automagically estimate the proper number of buckets, based on the inodes allocated count. If you want to investigate performance more I recommend you use our debug option -d 7 or set environment variable MM_POLICY_DEBUG_BITS=7 - this will show you how the work is divided among the nodes and threads. --marc of GPFS -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Thu Oct 29 14:14:58 2015 From: knop at us.ibm.com (Felipe Knop) Date: Thu, 29 Oct 2015 09:14:58 -0500 Subject: [gpfsug-discuss] Intro (new member) Message-ID: Hi, I have just joined the GPFS (Spectrum Scale) UG list. I work in the GPFS development team. I had the chance of attending the "Inaugural USA Meet the Devs" session in New York City on Oct 7, which was a valuable opportunity to hear from customers using the product. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carlz at us.ibm.com Fri Oct 30 15:14:50 2015 From: carlz at us.ibm.com (Carl Zetie) Date: Fri, 30 Oct 2015 10:14:50 -0500 Subject: [gpfsug-discuss] Making an RFE Public (and an intro) Message-ID: <201510301520.t9UFKTUP032501@d01av04.pok.ibm.com> First the intro: I am the new Product Manager joining the Spectrum Scale team, taking the place of Janet Ellsworth. I'm looking forward to meeting with you all. I also have some news about RFEs: we are working to enable you to choose whether your RFEs for Scale are private or public. I know that many of you have requested public RFEs so that other people can see and vote on RFEs. We'd like to see that too as it's very valuable information for us (as well as reducing duplicates). So here's what we're doing: Short term: If you have an existing RFE that you would like to see made Public, please email me with the ID of the RFE. You can find my email address at the foot of this message. PLEASE don't email the entire list! Medium term: We are working to allow you to choose at the time of submission whether a request will be Private or Public. Unfortunately for technical internal reasons we can't simply make the Public / Private field selectable at submission time (don't ask!), so instead we are creating two submission queues, one for Private RFEs and another for public RFEs. So when you submit an RFE in future you'll start by selecting the appropriate queue. Inside IBM, they all go into the same evaluation process. As soon as I have an update on the availability of this fix, I will share with the group. Note that even for Public requests, some fields remain Private and hidden from other viewers, e.g. Business Case (look for the "key" icon next to the field to confirm). regards, Carl Carl Zetie Product Manager for Spectrum Scale, IBM (540) 882 9353 ][ 15750 Brookhill Ct, Waterford VA 20197 carlz at us.ibm.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfhamano at us.ibm.com Fri Oct 30 15:29:58 2015 From: jfhamano at us.ibm.com (John Hamano) Date: Fri, 30 Oct 2015 07:29:58 -0800 Subject: [gpfsug-discuss] Making an RFE Public (and an intro) In-Reply-To: <201510301520.t9UFKTUP032501@d01av04.pok.ibm.com> References: <201510301520.t9UFKTUP032501@d01av04.pok.ibm.com> Message-ID: <201510301530.t9UFUM0M004729@d03av05.boulder.ibm.com> Hi Carl, welcome and congratulations on your new role. I am North America Brand Sales for ESS and Spectrum Scale. Let me know when you have some time next weekg to talk. From: Carl Zetie/Fairfax/IBM at IBMUS To: gpfsug-discuss at spectrumscale.org, Date: 10/30/2015 08:20 AM Subject: [gpfsug-discuss] Making an RFE Public (and an intro) Sent by: gpfsug-discuss-bounces at spectrumscale.org First the intro: I am the new Product Manager joining the Spectrum Scale team, taking the place of Janet Ellsworth. I'm looking forward to meeting with you all. I also have some news about RFEs: we are working to enable you to choose whether your RFEs for Scale are private or public. I know that many of you have requested public RFEs so that other people can see and vote on RFEs. We'd like to see that too as it's very valuable information for us (as well as reducing duplicates). So here's what we're doing: Short term: If you have an existing RFE that you would like to see made Public, please email me with the ID of the RFE. You can find my email address at the foot of this message. PLEASE don't email the entire list! Medium term: We are working to allow you to choose at the time of submission whether a request will be Private or Public. Unfortunately for technical internal reasons we can't simply make the Public / Private field selectable at submission time (don't ask!), so instead we are creating two submission queues, one for Private RFEs and another for public RFEs. So when you submit an RFE in future you'll start by selecting the appropriate queue. Inside IBM, they all go into the same evaluation process. As soon as I have an update on the availability of this fix, I will share with the group. Note that even for Public requests, some fields remain Private and hidden from other viewers, e.g. Business Case (look for the "key" icon next to the field to confirm). regards, Carl Carl Zetie Product Manager for Spectrum Scale, IBM (540) 882 9353 ][ 15750 Brookhill Ct, Waterford VA 20197 carlz at us.ibm.com_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: