From andi at christiansen.xxx Wed Apr 1 10:04:56 2020 From: andi at christiansen.xxx (Andi Christiansen) Date: Wed, 1 Apr 2020 11:04:56 +0200 (CEST) Subject: [gpfsug-discuss] Enabling SSL/HTTPS/ on Object S3. Message-ID: <706418212.158040.1585731896422@privateemail.com> Hi, We are trying to enable S3 on the object protocol within scale but there seem to be little to no documentation to enable https endpoints for the S3 protocol? According to the documentation enabling S3 for the keystone server is possible with the mmuserauth command but when i try to run it as IBM have documented, it says that Object protocol is not correctly installed.. And yes it hasnt been configured yet.. The "mmobj swift base" command which is used to configure Object/S3 automatically includes the "mmuserauth" command without the ssl option enabled.. and then all endpoints will start with http:// I hope that anyone out there have a guide to do this ? or is able to explain how to set it up? Basically all i need is this: https://s3.something.com:8080 which points to the WAN ip of the CES cluster (already configured and ready) and endpoints like this: None | keystone | identity | True | public | https://cluster_domain:5000/ RegionOne | swift | object-store | True | public | https://cluster_domain:443/v1/AUTH_%(tenant_id)s RegionOne | swift | object-store | True | public | https://cluster_domain:8080/v1/AUTH_%(tenant_id)s if i manually add those endpoints and put my certificates in /etc/swift/ and update the config it says (SSL: Wrong_Version_Number). Here is output: C:\Users\Andi Christiansen>aws --endpoint-url https://WAN_IP/DOMAIN https://WAN :443 s3 ls SSL validation failed for https://WAN_IP/DOMAIN:443/ [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate (_ssl.c:1076) C:\Users\Andi Christiansen>aws --endpoint-url https://WAN_IP/DOMAIN:8080 s3 ls SSL validation failed for https://WAN_IP/DOMAIN:8080/ [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1076) its only port 8080 and 5000 that is allowed through the firewall, so i only tested with 443 to see if it gave another error as it is not allowed through and it did.. It works just fine when "mmobj swift base" is run normally and i only have http endpoints, then it is reachable from local network or WAN with no issues.. Thanks in advance! Best Regards Andi Christiansen -------------- next part -------------- An HTML attachment was scrubbed... URL: From smita.raut at in.ibm.com Wed Apr 1 10:52:44 2020 From: smita.raut at in.ibm.com (Smita J Raut) Date: Wed, 1 Apr 2020 15:22:44 +0530 Subject: [gpfsug-discuss] Enabling SSL/HTTPS/ on Object S3. In-Reply-To: <706418212.158040.1585731896422@privateemail.com> References: <706418212.158040.1585731896422@privateemail.com> Message-ID: Hi Andi, For object SSL configuration you need to reconfigure auth after "mmobj swift base". Instructions are here- https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1adm_configlocalauthssl.htm Some more info on object auth configuration- https://www.slideshare.net/SmitaRaut/ibm-spectrum-scale-authentication-for-object-deep-dive (Check slide 26) Thanks, Smita From: Andi Christiansen To: "gpfsug-discuss at spectrumscale.org" Date: 04/01/2020 02:35 PM Subject: [EXTERNAL] [gpfsug-discuss] Enabling SSL/HTTPS/ on Object S3. Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, We are trying to enable S3 on the object protocol within scale but there seem to be little to no documentation to enable https endpoints for the S3 protocol? According to the documentation enabling S3 for the keystone server is possible with the mmuserauth command but when i try to run it as IBM have documented, it says that Object protocol is not correctly installed.. And yes it hasnt been configured yet.. The "mmobj swift base" command which is used to configure Object/S3 automatically includes the "mmuserauth" command without the ssl option enabled.. and then all endpoints will start with http:// I hope that anyone out there have a guide to do this ? or is able to explain how to set it up? Basically all i need is this: https://s3.something.com:8080 which points to the WAN ip of the CES cluster (already configured and ready) and endpoints like this: None | keystone | identity | True | public | https://cluster_domain:5000/ RegionOne | swift | object-store | True | public | https://cluster_domain:443/v1/AUTH_%(tenant_id)s RegionOne | swift | object-store | True | public | https://cluster_domain:8080/v1/AUTH_%(tenant_id)s if i manually add those endpoints and put my certificates in /etc/swift/ and update the config it says (SSL: Wrong_Version_Number). Here is output: C:\Users\Andi Christiansen>aws --endpoint-url https://WAN_IP/DOMAIN:443 s3 ls SSL validation failed for https://WAN_IP/DOMAIN:443/ [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate (_ssl.c:1076) C:\Users\Andi Christiansen>aws --endpoint-url https://WAN_IP/DOMAIN:8080 s3 ls SSL validation failed for https://WAN_IP/DOMAIN:8080/ [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1076) its only port 8080 and 5000 that is allowed through the firewall, so i only tested with 443 to see if it gave another error as it is not allowed through and it did.. It works just fine when "mmobj swift base" is run normally and i only have http endpoints, then it is reachable from local network or WAN with no issues.. Thanks in advance! Best Regards Andi Christiansen _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=ZKPP3G6NR3aLNRqaXZWW90vDcvevU1hcxJA6_1Up8Ic&m=ZSHZbcegNHURIVsXPDASH5sTFwYAZYYLv-RnoaKNzxw&s=n1X6h1EYg8gdiHH8BFe4OYVQvIMSxoYXRMX3SC2IaBY&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From andi at christiansen.xxx Wed Apr 1 12:21:37 2020 From: andi at christiansen.xxx (Andi Christiansen) Date: Wed, 1 Apr 2020 13:21:37 +0200 (CEST) Subject: [gpfsug-discuss] Enabling SSL/HTTPS/ on Object S3. In-Reply-To: References: <706418212.158040.1585731896422@privateemail.com> Message-ID: <1057409925.160136.1585740097841@privateemail.com> Hi Smita, Thanks for your reply. i have tried what you suggested. mmobj swift base ran fine. but after i have deleted the userauth and try to set it up again with ks-ssl enabled it just hangs: # mmuserauth service create --data-access-method object --type local --enable-ks-ssl still waiting for it to finish, 15 mins now.. :) Best Regards Andi Christiansen > On April 1, 2020 11:52 AM Smita J Raut wrote: > > > Hi Andi, > > For object SSL configuration you need to reconfigure auth after "mmobj swift base". Instructions are here- > https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1adm_configlocalauthssl.htm https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1adm_configlocalauthssl.htm > > Some more info on object auth configuration- > https://www.slideshare.net/SmitaRaut/ibm-spectrum-scale-authentication-for-object-deep-dive https://www.slideshare.net/SmitaRaut/ibm-spectrum-scale-authentication-for-object-deep-dive (Check slide 26) > > Thanks, > Smita > > > > From: Andi Christiansen > To: "gpfsug-discuss at spectrumscale.org" > Date: 04/01/2020 02:35 PM > Subject: [EXTERNAL] [gpfsug-discuss] Enabling SSL/HTTPS/ on Object S3. > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > --------------------------------------------- > > > > Hi, > > We are trying to enable S3 on the object protocol within scale but there seem to be little to no documentation to enable https endpoints for the S3 protocol? > > According to the documentation enabling S3 for the keystone server is possible with the mmuserauth command but when i try to run it as IBM have documented, it says that Object protocol is not correctly installed.. And yes it hasnt been configured yet.. > > The "mmobj swift base" command which is used to configure Object/S3 automatically includes the "mmuserauth" command without the ssl option enabled.. and then all endpoints will start with http:// > > > I hope that anyone out there have a guide to do this ? or is able to explain how to set it up? > > > Basically all i need is this: > > https://s3.something.com:8080 https://s3.something.com:8080 which points to the WAN ip of the CES cluster (already configured and ready) > > and endpoints like this: > > None | keystone | identity | True | public | https://cluster_domain:5000/ https://cluster_domain:5000/ > RegionOne | swift | object-store | True | public | https://cluster_domain:443/v1/AUTH_%(tenant_id)s > RegionOne | swift | object-store | True | public | https://cluster_domain:8080/v1/AUTH_%(tenant_id)s > > if i manually add those endpoints and put my certificates in /etc/swift/ and update the config it says (SSL: Wrong_Version_Number). Here is output: > > C:\Users\Andi Christiansen>aws --endpoint-url https://WAN_IP/DOMAIN https://WAN :443 s3 ls > SSL validation failed for https://WAN_IP/DOMAIN:443/ https://WAN_IP/DOMAIN:443/ [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate (_ssl.c:1076) > C:\Users\Andi Christiansen>aws --endpoint-url https://WAN_IP/DOMAIN:8080 https://WAN_IP/DOMAIN:8080 s3 ls > SSL validation failed for https://WAN_IP/DOMAIN:8080/ https://WAN_IP/DOMAIN:8080/ [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1076) > > > its only port 8080 and 5000 that is allowed through the firewall, so i only tested with 443 to see if it gave another error as it is not allowed through and it did.. > > > It works just fine when "mmobj swift base" is run normally and i only have http endpoints, then it is reachable from local network or WAN with no issues.. > > > > Thanks in advance! > > > Best Regards > Andi Christiansen _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Wed Apr 1 15:06:43 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 1 Apr 2020 15:06:43 +0100 Subject: [gpfsug-discuss] DSS-G dowloads Message-ID: <58b5e5f2-8da5-9b14-0dfb-acfe9cef0713@strath.ac.uk> I have just been trying to download the 2.4b release and I am not getting anywhere. A little investigation shows that lenovoesd.flexnetoperations.com does not resolve in the DNS. Not from work, not from home, not using 1.1.1.1 or 8.8.8.8 Anyone know what is going on? Has it moved and if so to where? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From S.J.Thompson at bham.ac.uk Wed Apr 1 15:40:30 2020 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Wed, 1 Apr 2020 14:40:30 +0000 Subject: [gpfsug-discuss] DSS-G dowloads In-Reply-To: <58b5e5f2-8da5-9b14-0dfb-acfe9cef0713@strath.ac.uk> References: <58b5e5f2-8da5-9b14-0dfb-acfe9cef0713@strath.ac.uk> Message-ID: <763B4054-02F2-483A-9465-3102AB5493E1@bham.ac.uk> It moved. We had email notifications about this ages ago. Accounts were created automatically for us for those on the contract admin role. 2.5c is latest release (5.0.4-1.6 or 4.2.3-18) Go to https://commercial.lenovo.com/ Simon Simon ?On 01/04/2020, 15:06, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" wrote: I have just been trying to download the 2.4b release and I am not getting anywhere. A little investigation shows that lenovoesd.flexnetoperations.com does not resolve in the DNS. Not from work, not from home, not using 1.1.1.1 or 8.8.8.8 Anyone know what is going on? Has it moved and if so to where? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jroche at lenovo.com Wed Apr 1 15:34:36 2020 From: jroche at lenovo.com (Jim Roche) Date: Wed, 1 Apr 2020 14:34:36 +0000 Subject: [gpfsug-discuss] [External] DSS-G dowloads In-Reply-To: <58b5e5f2-8da5-9b14-0dfb-acfe9cef0713@strath.ac.uk> References: <58b5e5f2-8da5-9b14-0dfb-acfe9cef0713@strath.ac.uk> Message-ID: Hi Jonathan, I don't think the site has moved. I'm investigating why it cannot be found and will let you know what is going on. Regards, Jim Jim Roche Head of Research Computing University Relations Manager Redwood, 3 Chineham Business Park, Crockford Lane Basingstoke Hampshire RG24 8WQ Lenovo UK +44 7702678579 jroche at lenovo.com ? Lenovo.com? Twitter?|?Instagram?|?Facebook?|?Linkedin?|?YouTube?|?Privacy? -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jonathan Buzzard Sent: 01 April 2020 15:07 To: gpfsug-discuss at spectrumscale.org Subject: [External] [gpfsug-discuss] DSS-G dowloads I have just been trying to download the 2.4b release and I am not getting anywhere. A little investigation shows that lenovoesd.flexnetoperations.com does not resolve in the DNS. Not from work, not from home, not using 1.1.1.1 or 8.8.8.8 Anyone know what is going on? Has it moved and if so to where? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From ncalimet at lenovo.com Wed Apr 1 15:46:32 2020 From: ncalimet at lenovo.com (Nicolas CALIMET) Date: Wed, 1 Apr 2020 14:46:32 +0000 Subject: [gpfsug-discuss] [External] DSS-G dowloads In-Reply-To: <58b5e5f2-8da5-9b14-0dfb-acfe9cef0713@strath.ac.uk> References: <58b5e5f2-8da5-9b14-0dfb-acfe9cef0713@strath.ac.uk> Message-ID: <477be93f0bc8411a8d8c31935db28a4f@lenovo.com> The old Lenovo ESD website is gone; retired some time ago. Please visit instead: https://commercial.lenovo.com FWIW the most current release is DSS-G 2.5c. Thanks -- Nicolas Calimet, PhD | HPC System Architect | Lenovo DCG | Meitnerstrasse 9, D-70563 Stuttgart, Germany | +49 71165690146 -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jonathan Buzzard Sent: Wednesday, April 1, 2020 16:07 To: gpfsug-discuss at spectrumscale.org Subject: [External] [gpfsug-discuss] DSS-G dowloads I have just been trying to download the 2.4b release and I am not getting anywhere. A little investigation shows that lenovoesd.flexnetoperations.com does not resolve in the DNS. Not from work, not from home, not using 1.1.1.1 or 8.8.8.8 Anyone know what is going on? Has it moved and if so to where? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathan.buzzard at strath.ac.uk Wed Apr 1 19:50:28 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 1 Apr 2020 19:50:28 +0100 Subject: [gpfsug-discuss] DSS-G dowloads In-Reply-To: <763B4054-02F2-483A-9465-3102AB5493E1@bham.ac.uk> References: <58b5e5f2-8da5-9b14-0dfb-acfe9cef0713@strath.ac.uk> <763B4054-02F2-483A-9465-3102AB5493E1@bham.ac.uk> Message-ID: On 01/04/2020 15:40, Simon Thompson wrote: > It moved. We had email notifications about this ages ago. Accounts > were created automatically for us for those on the contract admin > role. 2.5c is latest release (5.0.4-1.6 or 4.2.3-18) > You are right once I search my spam folder. Thanks a bunch Microsoft. I am still not convinced that are still not evil. They seem determined to put my CentOS security emails in the spam folder. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From jkavitsky at 23andme.com Fri Apr 3 23:25:33 2020 From: jkavitsky at 23andme.com (Jim Kavitsky) Date: Fri, 3 Apr 2020 15:25:33 -0700 Subject: [gpfsug-discuss] fast search for archivable data sets Message-ID: Hello everyone, I'm managing a low-multi-petabyte Scale filesystem with hundreds of millions of inodes, and I'm looking for the best way to locate archivable directories. For example, these might be directories where whose contents were greater than 5 or 10TB, and whose contents had atimes greater than two years. Has anyone found a great way to do this with a policy engine run? If not, is there another good way that anyone would recommend? Thanks in advance, Jim Kavitsky -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Sat Apr 4 00:45:18 2020 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Fri, 3 Apr 2020 19:45:18 -0400 Subject: [gpfsug-discuss] fast search for archivable data sets In-Reply-To: References: Message-ID: Hi Jim, If you never worked with policy rules before, you may want to start by building your nerves to it. In the /usr/lpp/mmfs/samples/ilm path you will find several examples of templates that you can use to play around. I would start with the 'list' rules first. Some of those templates are a bit complex, so here is one script that I use on a regular basis to detect files larger than 1MB (you can even exclude specific filesets): ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ dss-mgt1:/scratch/r/root/mmpolicyRules # cat mmpolicyRules-list-large /* A macro to abbreviate VARCHAR */ define([vc],[VARCHAR($1)]) /* Define three external lists */ RULE EXTERNAL LIST 'largefiles' EXEC '/gpfs/fs0/scratch/r/root/mmpolicyRules/mmpolicyExec-list' /* Generate a list of all files that have more than 1MB of space allocated. */ RULE 'r2' LIST 'largefiles' SHOW('-u' vc(USER_ID) || ' -s' || vc(FILE_SIZE)) /*FROM POOL 'system'*/ FROM POOL 'data' /*FOR FILESET('root')*/ WEIGHT(FILE_SIZE) WHERE KB_ALLOCATED > 1024 /* Files in special filesets, such as mmpolicyRules, are never moved or deleted */ RULE 'ExcSpecialFile' EXCLUDE FOR FILESET('mmpolicyRules','todelete','tapenode-stuff','toarchive') ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ And here is another to detect files not looked at for more than 6 months. I found more effective to use atime and ctime. You could combine this with the one above to detect file size as well. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ dss-mgt1:/scratch/r/root/mmpolicyRules # cat mmpolicyRules-list-atime-ctime-gt-6months /* A macro to abbreviate VARCHAR */ define([vc],[VARCHAR($1)]) /* Define three external lists */ RULE EXTERNAL LIST 'accessedfiles' EXEC '/gpfs/fs0/scratch/r/root/mmpolicyRules/mmpolicyExec-list' /* Generate a list of all files, directories, plus all other file system objects, like symlinks, named pipes, etc, accessed prior to a certain date AND that are not owned by root. Include the owner's id with each object and sort them by the owner's id */ /* Files in special filesets, such as mmpolicyRules, are never moved or deleted */ RULE 'ExcSpecialFile' EXCLUDE FOR FILESET ('scratch-root','todelete','root') RULE 'r5' LIST 'accessedfiles' DIRECTORIES_PLUS FROM POOL 'data' SHOW('-u' vc(USER_ID) || ' -a' || vc(ACCESS_TIME) || ' -c' || vc(CREATION_TIME) || ' -s ' || vc(FILE_SIZE)) WHERE (DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME) > 183) AND (DAYS(CURRENT_TIMESTAMP) - DAYS(CREATION_TIME) > 183) AND NOT USER_ID = 0 AND NOT (PATH_NAME LIKE '/gpfs/fs0/scratch/r/root/%') ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Note that both these scripts work on a system wide (or root fileset) basis, and will not give you specific directories, unless you run them several times on specific directories (not very efficient). To produce general lists per directory you would need to do some post processing on the lists, with 'awk' or some other scripting language. If you need some samples I can send you. And finally, you need to be more specific by what you mean by 'archivable'. Once you produce the list you can do several things with them or leverage the rules to actually execute things, such as move, delete, or hsm stuff. The /usr/lpp/mmfs/samples/ilm path has some samples as well. On 4/3/2020 18:25:33, Jim Kavitsky wrote: > Hello everyone, > I'm managing a low-multi-petabyte Scale filesystem with hundreds of millions of inodes, and I'm looking?for the best way to locate archivable directories. For example, these might be directories where whose contents were greater than 5 or 10TB, and whose contents had atimes greater than two years. > > Has anyone found a great way to do this with a policy engine run? If not, is there another good way that anyone would recommend? Thanks in advance, yes, there is another way, the 'mmfind' utility, also in the same sample path. You have to compile it for you OS (mmfind.README). This is a very powerful canned procedure that lets you run the "-exec" option just as in the normal linux version of 'find'. I use it very often, and it's just as efficient as the other policy rules based alternative. Good luck. Keep safe and confined. Jaime > > Jim Kavitsky > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > . . . ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 From alex at calicolabs.com Sat Apr 4 00:50:50 2020 From: alex at calicolabs.com (Alex Chekholko) Date: Fri, 3 Apr 2020 16:50:50 -0700 Subject: [gpfsug-discuss] fast search for archivable data sets In-Reply-To: References: Message-ID: Hi Jim, The common non-GPFS-specific way is to use a tool that dumps all of your filesystem metadata into an SQL database and then you can have a webapp that makes nice graphs/reports from the SQL database, or do your own queries. The Free Software example is "Robinhood" (use the POSIX scanner, not the lustre-specific one) and one proprietary example is Starfish. In both cases, you need a pretty beefy machine for the DB and the scanning of your filesystem may take a long time, depending on your filesystem performance. And then without any filesystem-specific hooks like a transaction log, you'll need to rescan the entire filesystem to update your db. Regards, Alex On Fri, Apr 3, 2020 at 3:25 PM Jim Kavitsky wrote: > Hello everyone, > I'm managing a low-multi-petabyte Scale filesystem with hundreds of > millions of inodes, and I'm looking for the best way to locate archivable > directories. For example, these might be directories where whose contents > were greater than 5 or 10TB, and whose contents had atimes greater than two > years. > > Has anyone found a great way to do this with a policy engine run? If not, > is there another good way that anyone would recommend? Thanks in advance, > > Jim Kavitsky > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cblack at nygenome.org Sat Apr 4 01:26:22 2020 From: cblack at nygenome.org (Christopher Black) Date: Sat, 4 Apr 2020 00:26:22 +0000 Subject: [gpfsug-discuss] fast search for archivable data sets In-Reply-To: References: Message-ID: As Alex mentioned, there are tools that will keep filesystem metadata in a database and provide query tools. NYGC uses Starfish and we?ve had good experience with it. At first the only feature we used is ?sfdu? which is a quick replacement for recursive du. Using this we can script csv reports for selections of dirs. As we use starfish more, we?ve started opening the web interface to people to look at selected areas of our filesystems where they can sort directories by size, mtime, atime, and run other reports and queries. We?ve also started using tagging functionality so we can quickly get an aggregate total (and growth over time) by tag across multiple directories. We tried Robinhood years ago but found it was taking too much work to get it to scale to 100s of millions of files and 10s of PiB on gpfs. It might be better now. IBM has a metadata product called Spectrum Discover that has the benefit of using gpfs-specific interfaces to be always up to date. Many of the other tools require scheduling scans to update the db. Igneous has a commercial tool called DataDiscover which also looked promising. ClarityNow and MediaFlux are other similar tools. I expect all of these tools at the very least have nice replacements for du and find as well as some sort of web directory tree view. We had run Starfish for a while and did a re-evaluation of a few options in 2019 and ultimately decided to stay with Starfish for now. Best, Chris From: on behalf of Alex Chekholko Reply-To: gpfsug main discussion list Date: Friday, April 3, 2020 at 7:51 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] fast search for archivable data sets Hi Jim, The common non-GPFS-specific way is to use a tool that dumps all of your filesystem metadata into an SQL database and then you can have a webapp that makes nice graphs/reports from the SQL database, or do your own queries. The Free Software example is "Robinhood" (use the POSIX scanner, not the lustre-specific one) and one proprietary example is Starfish. In both cases, you need a pretty beefy machine for the DB and the scanning of your filesystem may take a long time, depending on your filesystem performance. And then without any filesystem-specific hooks like a transaction log, you'll need to rescan the entire filesystem to update your db. Regards, Alex On Fri, Apr 3, 2020 at 3:25 PM Jim Kavitsky > wrote: Hello everyone, I'm managing a low-multi-petabyte Scale filesystem with hundreds of millions of inodes, and I'm looking for the best way to locate archivable directories. For example, these might be directories where whose contents were greater than 5 or 10TB, and whose contents had atimes greater than two years. Has anyone found a great way to do this with a policy engine run? If not, is there another good way that anyone would recommend? Thanks in advance, Jim Kavitsky _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From leslie.james.elliott at gmail.com Sat Apr 4 07:00:34 2020 From: leslie.james.elliott at gmail.com (leslie elliott) Date: Sat, 4 Apr 2020 16:00:34 +1000 Subject: [gpfsug-discuss] afmHashVersion Message-ID: I was wondering if there was any more information on the different values for afmHashVersion the default value is 2 but if we want to assign an afmGateway to a fileset we need a value of 5 is there likely to be any performance degradation because of this change do the home cluster and the cache cluster both have to be set to 5 for the fileset allocation to gateways just trying to find a little more information before we try this on a production system with a large number of afm independent filesets leslie -------------- next part -------------- An HTML attachment was scrubbed... URL: From spectrumscale at kiranghag.com Sat Apr 4 22:57:33 2020 From: spectrumscale at kiranghag.com (KG) Date: Sun, 5 Apr 2020 03:27:33 +0530 Subject: [gpfsug-discuss] io500 - mmfind - Pfind found 0 matches, something is wrong with the script. Message-ID: Hi Folks I am trying to setup IO500 test on a scale cluster and looking for more info on mmfind. I have compiled mmfindUtil_processOutputFile and updated the correct path in mmfind.sh. The runs however do not come up with any matches. Any pointers wrt something that I may have missed? TIA [Starting] mdtest_hard_write [Exec] mpirun -np 2 /tools/io-500-dev-master/bin/mdtest -C -t -F -P -w *3901 *-e 3901 -d /ibm/nasdata/datafiles/io500.2020.04.05-03.20.00/mdt_hard -n 950000 -x /ibm/nasdata/datafiles/io500.2020.04.05-03.20.00/mdt_hard-stonewall -a POSIX -N 1 -Y -W 5 [Results] in /ibm/nasdata/results/2020.04.05-03.20.00/mdtest_hard_write.txt. [Warning] This cannot be an official IO-500 score. The phase runtime of 9.8918s is below 300s. [Warning] Suggest io500_mdtest_hard_files_per_proc=30732525 [RESULT-invalid] IOPS phase 2 mdtest_hard_write 0.225 kiops : time 8.99 seconds [Starting] find [Exec] mpirun -np 2 /tools/io-500-dev-master/bin/mmfind.sh /ibm/nasdata/datafiles/io500.2020.04.05-03.20.00 -newer /ibm/nasdata/datafiles/io500.2020.04.05-03.20.00/timestampfile -size *3901c *-name "*01*" [Results] in /ibm/nasdata/results/2020.04.05-03.20.00/find.txt. *[Warning] Pfind found 0 matches, something is wrong with the script.* [FIND] *MATCHED 0/3192* in 12.0671 seconds -------------- next part -------------- An HTML attachment was scrubbed... URL: From vpuvvada at in.ibm.com Mon Apr 6 10:16:49 2020 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Mon, 6 Apr 2020 14:46:49 +0530 Subject: [gpfsug-discuss] afmHashVersion In-Reply-To: References: Message-ID: afmHashVersion=5 does not cause any performance degradation, this hash version allows assigning a gateway for the fileset using mmchfileset command. This option is not required for AFM home cluster(assuming that home is not a cache for other home). It is needed only at the AFM cache cluster and at client cluster if it remote mounts the AFM cache cluster. For changing afmHashVersion=5, all the nodes in the AFM cache and client cluster have to be upgraded to the minimum 5.0.2 level. This option cannot be set dynamically using -i/-I option, all the nodes in the both AFM cache and client clusters have to be shutdown to set this option. It is recommended to use 5.0.4-3 or later. https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.2/com.ibm.spectrum.scale.v5r02.doc/bl1adm_mmchconfig.htm ~Venkat (vpuvvada at in.ibm.com) From: leslie elliott To: gpfsug main discussion list Date: 04/04/2020 11:30 AM Subject: [EXTERNAL] [gpfsug-discuss] afmHashVersion Sent by: gpfsug-discuss-bounces at spectrumscale.org I was wondering if there was any more information on the different values for afmHashVersion the default value is 2 but if we want to assign an afmGateway to a fileset we need a value of 5 is there likely to be any performance degradation because of this change do the home cluster and the cache cluster both have to be set to 5 for the fileset allocation to gateways just trying to find a little more information before we try this on a production system with a large number of afm independent filesets leslie _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=Y5qSHFJ-z_7fbgD3YvcDG0SCsJbJ5rvNPBI5y5eF6Ec&s=b7XaEKNTas9WQ9qZNBSOW2XDvQNzUMTgdcAb7lQ4170&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From marc.caubet at psi.ch Mon Apr 6 12:20:59 2020 From: marc.caubet at psi.ch (Caubet Serrabou Marc (PSI)) Date: Mon, 6 Apr 2020 11:20:59 +0000 Subject: [gpfsug-discuss] "csm_resync_needed" after upgrading to GPFS v5.0.4-2 Message-ID: Hi all, after upgrading one of the clusters to GPFS v5.0.4-2 and setting "minReleaseLevel 5.0.4.0" I started to see random "csm_resync_needed" errors on some nodes. This can be easily cleared with "mmhealth node show --resync", however after some minutes the error re-appears. Apparently, no errors in the log files and no apparent problems other than the "csm_resync_needed" error. Before opening a support case, any hints about what could be the reason of that and whether I should worry about it? I would like to clarify what's going on before upgrading the main cluster. Thanks a lot, Marc _________________________________________________________ Paul Scherrer Institut High Performance Computing & Emerging Technologies Marc Caubet Serrabou Building/Room: OHSA/014 Forschungsstrasse, 111 5232 Villigen PSI Switzerland Telephone: +41 56 310 46 67 E-Mail: marc.caubet at psi.ch -------------- next part -------------- An HTML attachment was scrubbed... URL: From NSCHULD at de.ibm.com Mon Apr 6 13:25:22 2020 From: NSCHULD at de.ibm.com (Norbert Schuld) Date: Mon, 6 Apr 2020 14:25:22 +0200 Subject: [gpfsug-discuss] =?utf-8?q?=22csm=5Fresync=5Fneeded=22_after_upgr?= =?utf-8?q?ading_to_GPFS=09v5=2E0=2E4-2?= In-Reply-To: References: Message-ID: Hi, are the nodes running on AIX? If so my advice would be to change /var/mmfs/mmsysmon/mmsysmonitor.conf to read [InterNodeEventing] usesharedlib = 0 and the do a "mmsysmoncontrol restart". What was the min. release level before the upgrade? For most other cases a "mmsysmoncontrol restart" on the affected nodes + cluster manager node should do. Mit freundlichen Gr??en / Kind regards Norbert Schuld From: "Caubet Serrabou Marc (PSI)" To: "gpfsug-discuss at spectrumscale.org" Date: 06.04.2020 13:36 Subject: [EXTERNAL] [gpfsug-discuss] "csm_resync_needed" after upgrading to GPFS v5.0.4-2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, after upgrading one of the clusters to GPFS v5.0.4-2 and setting "minReleaseLevel 5.0.4.0" I started to see random "csm_resync_needed" errors on some nodes. This can be easily cleared with "mmhealth node show --resync", however after some minutes the error re-appears. Apparently, no errors in the log files and no apparent problems other than the "csm_resync_needed" error. Before opening a support case, any hints about what could be the reason of that and whether I should worry about it? I would like to clarify what's going on before upgrading the main cluster. Thanks a lot, Marc _________________________________________________________ Paul Scherrer Institut High Performance Computing & Emerging Technologies Marc Caubet Serrabou Building/Room: OHSA/014 Forschungsstrasse, 111 5232 Villigen PSI Switzerland Telephone: +41 56 310 46 67 E-Mail: marc.caubet at psi.ch_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=i4V0h7L9ElftZNfcuPIXmAHN2jl5TLcuyFLqtinu4j8&m=gU-FoFUzF10SfzgJPcd51vPIxjhkE6puV5hxAyPIA6I&s=zdEGNkM_ZSiem6wnOFZFVpTGjvSPG4wlFUFIhDVqcWM&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From marc.caubet at psi.ch Mon Apr 6 13:54:43 2020 From: marc.caubet at psi.ch (Caubet Serrabou Marc (PSI)) Date: Mon, 6 Apr 2020 12:54:43 +0000 Subject: [gpfsug-discuss] "csm_resync_needed" after upgrading to GPFS v5.0.4-2 In-Reply-To: References: , Message-ID: <46571b6503544f329029b2520c70152e@psi.ch> Hi Norbert, thanks a lot for for answering. The nodes are running RHEL7.7 (Kernel 3.10.0-1062.12.1.el7.x86_64). The previous version was 5.0.3-2. I restarted mmsysmoncontrol (I kept usesharedlib=1 as this is RHEL). Restarting it, it cleans mmhealth messages as expected, let's see whether this is repeated or not but it might take several minutes. Just add that when I had a mix of 5.0.3-2 and 5.0.4-2 I received some 'stale_mount' messages (from GPFSGUI) for a remote cluster filesystem mountpoints, but apparently everything worked fine. After upgrading everything to v5.0.4-2 looks like the same nodes report the 'csm_resync_needed' instead (no more 'stale_mount' errors seen since then). I am not sure whether this is related or not but might be a hint if this is related. Best regards, Marc _________________________________________________________ Paul Scherrer Institut High Performance Computing & Emerging Technologies Marc Caubet Serrabou Building/Room: OHSA/014 Forschungsstrasse, 111 5232 Villigen PSI Switzerland Telephone: +41 56 310 46 67 E-Mail: marc.caubet at psi.ch ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Norbert Schuld Sent: Monday, April 6, 2020 2:25:22 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] "csm_resync_needed" after upgrading to GPFS v5.0.4-2 Hi, are the nodes running on AIX? If so my advice would be to change /var/mmfs/mmsysmon/mmsysmonitor.conf to read [InterNodeEventing] usesharedlib = 0 and the do a "mmsysmoncontrol restart". What was the min. release level before the upgrade? For most other cases a "mmsysmoncontrol restart" on the affected nodes + cluster manager node should do. Mit freundlichen Gr??en / Kind regards Norbert Schuld [Inactive hide details for "Caubet Serrabou Marc (PSI)" ---06.04.2020 13:36:28---Hi all, after upgrading one of the clusters to]"Caubet Serrabou Marc (PSI)" ---06.04.2020 13:36:28---Hi all, after upgrading one of the clusters to GPFS v5.0.4-2 and setting "minReleaseLevel 5.0.4.0" I From: "Caubet Serrabou Marc (PSI)" To: "gpfsug-discuss at spectrumscale.org" Date: 06.04.2020 13:36 Subject: [EXTERNAL] [gpfsug-discuss] "csm_resync_needed" after upgrading to GPFS v5.0.4-2 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi all, after upgrading one of the clusters to GPFS v5.0.4-2 and setting "minReleaseLevel 5.0.4.0" I started to see random "csm_resync_needed" errors on some nodes. This can be easily cleared with "mmhealth node show --resync", however after some minutes the error re-appears. Apparently, no errors in the log files and no apparent problems other than the "csm_resync_needed" error. Before opening a support case, any hints about what could be the reason of that and whether I should worry about it? I would like to clarify what's going on before upgrading the main cluster. Thanks a lot, Marc _________________________________________________________ Paul Scherrer Institut High Performance Computing & Emerging Technologies Marc Caubet Serrabou Building/Room: OHSA/014 Forschungsstrasse, 111 5232 Villigen PSI Switzerland Telephone: +41 56 310 46 67 E-Mail: marc.caubet at psi.ch_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: graycol.gif URL: From lists at esquad.de Mon Apr 6 13:50:28 2020 From: lists at esquad.de (Dieter Mosbach) Date: Mon, 6 Apr 2020 14:50:28 +0200 Subject: [gpfsug-discuss] "csm_resync_needed" after upgrading to GPFS v5.0.4-2 In-Reply-To: References: Message-ID: <04d1ba1d-e3d0-41ab-85c7-c6d6cabfd0d4@esquad.de> Am 06.04.2020 um 13:20 schrieb Caubet Serrabou Marc (PSI): > Hi all, > > > after upgrading one of the clusters to GPFS v5.0.4-2 and setting "minReleaseLevel 5.0.4.0" I started to see random "csm_resync_needed" errors on some nodes. This can be easily cleared with "mmhealth node show --resync", however after some minutes the error re-appears. > > Apparently, no errors in the log files and no apparent problems other than the "csm_resync_needed" error. > > > Before opening a support case, any hints about what could be the reason of that and whether I should worry about it? I would like to clarify what's going on before upgrading the main cluster. This seems to be a bug in v5, open a support case. We had to check: mmdsh -N [AIXNode1,AIXNode2,AIXNode3] "grep usesharedlib /var/mmfs/mmsysmon/mmsysmonitor.conf" and to change: mmdsh -N [AIXNode1,AIXNode2,AIXNode3] "sed -i 's/usesharedlib = 1/usesharedlib = 0/g' /var/mmfs/mmsysmon/mmsysmonitor.conf" mmdsh -N [AIXNode1,AIXNode2,AIXNode3] "mmsysmoncontrol restart" Regards, Dieter From marc.caubet at psi.ch Tue Apr 7 07:38:42 2020 From: marc.caubet at psi.ch (Caubet Serrabou Marc (PSI)) Date: Tue, 7 Apr 2020 06:38:42 +0000 Subject: [gpfsug-discuss] "csm_resync_needed" after upgrading to GPFS v5.0.4-2 In-Reply-To: <04d1ba1d-e3d0-41ab-85c7-c6d6cabfd0d4@esquad.de> References: , <04d1ba1d-e3d0-41ab-85c7-c6d6cabfd0d4@esquad.de> Message-ID: <66cfa1b3942d45489c611d72e5b39d42@psi.ch> Hi, just for the record, after restarting mmsysmoncontrol on all nodes looks like the errors disappeared and no longer appear (and it has been running for several hours already). No need to change usesharedlib, which I have it enabled (1) for RHEL systems. Thanks a lot for your help, Marc _________________________________________________________ Paul Scherrer Institut High Performance Computing & Emerging Technologies Marc Caubet Serrabou Building/Room: OHSA/014 Forschungsstrasse, 111 5232 Villigen PSI Switzerland Telephone: +41 56 310 46 67 E-Mail: marc.caubet at psi.ch ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Dieter Mosbach Sent: Monday, April 6, 2020 2:50:28 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] "csm_resync_needed" after upgrading to GPFS v5.0.4-2 Am 06.04.2020 um 13:20 schrieb Caubet Serrabou Marc (PSI): > Hi all, > > > after upgrading one of the clusters to GPFS v5.0.4-2 and setting "minReleaseLevel 5.0.4.0" I started to see random "csm_resync_needed" errors on some nodes. This can be easily cleared with "mmhealth node show --resync", however after some minutes the error re-appears. > > Apparently, no errors in the log files and no apparent problems other than the "csm_resync_needed" error. > > > Before opening a support case, any hints about what could be the reason of that and whether I should worry about it? I would like to clarify what's going on before upgrading the main cluster. This seems to be a bug in v5, open a support case. We had to check: mmdsh -N [AIXNode1,AIXNode2,AIXNode3] "grep usesharedlib /var/mmfs/mmsysmon/mmsysmonitor.conf" and to change: mmdsh -N [AIXNode1,AIXNode2,AIXNode3] "sed -i 's/usesharedlib = 1/usesharedlib = 0/g' /var/mmfs/mmsysmon/mmsysmonitor.conf" mmdsh -N [AIXNode1,AIXNode2,AIXNode3] "mmsysmoncontrol restart" Regards, Dieter _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From kraemerf at de.ibm.com Tue Apr 14 08:42:12 2020 From: kraemerf at de.ibm.com (Frank Kraemer) Date: Tue, 14 Apr 2020 09:42:12 +0200 Subject: [gpfsug-discuss] *NEWS* IBM Spectrum Discover for Storage Metadata Management Video April 09, 2020 Message-ID: *FYI* IBM Spectrum Discover is a next-generation metadata management solution that delivers exceptional performance at exabyte scale, so organizations can harness value from massive amounts of unstructured data from heterogeneous file and object storage on premises and in the cloud to create competitive advantage in the areas of analytics and AI initiatives, governance, and storage optimization. Here are other videos in this series of related IBM Spectrum Discover topics that give you examples to get started: 1) IBM Spectrum Discover: Download, Deploy, and Configure https://youtu.be/FMOuzn__qRI 2) IBM Spectrum Discover: Scanning S3 data sources such as Amazon S3 or Ceph https://youtu.be/zaADfeTGwzY 3) IBM Spectrum Discover: Scanning IBM Spectrum Scale (GPFS) and IBM ESS data sources https://youtu.be/3mBQciR2tXE 4) IBM Spectrum Discover: Scanning an IBM Spectrum Protect data source https://youtu.be/wdXvnJ_GEQs 5) IBM Spectrum Discover: Insights into your files for better TCO with IBM Spectrum Archive EE https://youtu.be/_YNfFDdMEa4 Appendix: Here are additional online educational materials related to IBM Spectrum Discover solutions: IBM Spectrum Discover Knowledge Center: https://www.ibm.com/support/knowledgecenter/SSY8AC IBM Spectrum Discover Free 90 Day Trial: https://www.ibm.com/us-en/marketplace/spectrum-discover IBM Spectrum Discover: Metadata Management for Deep Insight of Unstructured Storage, REDP-5550: http://www.redbooks.ibm.com/abstracts/redp5550.html -frank- Frank Kraemer IBM Consulting IT Specialist / Client Technical Architect Am Weiher 24, 65451 Kelsterbach, Germany mailto:kraemerf at de.ibm.com Mobile +49171-3043699 IBM Germany -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Apr 14 11:15:41 2020 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 14 Apr 2020 10:15:41 +0000 Subject: [gpfsug-discuss] *NEWS* IBM Spectrum Discover for Storage Metadata Management Video April 09, 2020 In-Reply-To: References: Message-ID: <714908f022894851b52efa0944c80737@bham.ac.uk> Just a reminder that this is a Spectrum Scale technical forum and shouldn't be used for marketing nor advertising of other products. There are a number of vendors who have competing products who might also wish to post here. If you wish to discuss Discover at a technical level, there is a dedicated channel on the slack community for this. Thanks Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of kraemerf at de.ibm.com Sent: 14 April 2020 08:42 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] *NEWS* IBM Spectrum Discover for Storage Metadata Management Video April 09, 2020 *FYI* -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Wed Apr 15 16:29:53 2020 From: knop at us.ibm.com (Felipe Knop) Date: Wed, 15 Apr 2020 15:29:53 +0000 Subject: [gpfsug-discuss] Kernel crashes with Spectrum Scale and RHEL 7.7 3.10.0-1062.18.1.el7 kernel Message-ID: An HTML attachment was scrubbed... URL: From xhejtman at ics.muni.cz Wed Apr 15 16:36:48 2020 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Wed, 15 Apr 2020 17:36:48 +0200 Subject: [gpfsug-discuss] Kernel crashes with Spectrum Scale and RHEL 7.7 3.10.0-1062.18.1.el7 kernel In-Reply-To: References: Message-ID: <20200415153648.GK30439@ics.muni.cz> Hello, I noticed this bug, it took about 10 minutes to crash. However, I'm seeing similar NULL pointer dereference even with older kernels, That dereference does not happen always in GPFS code, sometimes outside in NFS or elsewhere, however it looks familiar. I have many crashdumps about this. On Wed, Apr 15, 2020 at 03:29:53PM +0000, Felipe Knop wrote: > All, > ? > A problem has been identified with Spectrum Scale when running on RHEL 7.7 > and kernel 3.10.0-1062.18.1.el7.? While a fix is being currently > developed, customers should not move up to this kernel level. > ? > The new kernel was issued on March 17 via the following errata:? > [1]https://access.redhat.com/errata/RHSA-2020:0834 > ? > When this kernel is used with Scale, system crashes have been observed. > The following are a couple of examples of kernel stack traces for the > crash: > ? > ? > [ 2915.625015] BUG: unable to handle kernel NULL pointer dereference at > 0000000000000040 > [ 2915.633770] IP: [] > cxiDropSambaDCacheEntry+0x190/0x1b0 [mmfslinux] > [ 2915.914097]? [] gpfs_i_rmdir+0x29c/0x310 [mmfslinux] > [ 2915.921381]? [] ? take_dentry_name_snapshot+0xf0/0xf0 > [ 2915.928760]? [] ? shrink_dcache_parent+0x60/0x90 > [ 2915.935656]? [] vfs_rmdir+0xdc/0x150 > [ 2915.941388]? [] do_rmdir+0x1f1/0x220 > [ 2915.947119]? [] ? __fput+0x186/0x260 > [ 2915.952849]? [] ? ____fput+0xe/0x10 > [ 2915.958484]? [] ? task_work_run+0xc0/0xe0 > [ 2915.964701]? [] SyS_unlinkat+0x25/0x40 > ? > [1224278.495993] [] __dentry_kill+0x128/0x190 > [1224278.496678] [] dput+0xb6/0x1a0 > [1224278.497378] [] d_prune_aliases+0xb6/0xf0 > [1224278.498083] [] cxiPruneDCacheEntry+0x13a/0x1c0 > [mmfslinux] > [1224278.498798] [] > _ZN10gpfsNode_t16invalidateOSNodeEPS_Pvij+0x108/0x350 [mmfs26] > ? > ? > RHEL 7.8 is also impacted by the same problem, but validation of Scale > with 7.8 is still under way. > ? > ? > ? Felipe > ? > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > ? > > References > > Visible links > 1. https://access.redhat.com/errata/RHSA-2020:0834 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title From laurence.schuler at nasa.gov Wed Apr 15 16:49:59 2020 From: laurence.schuler at nasa.gov (Schuler, Laurence (GSFC-606.4)[ADNET SYSTEMS INC]) Date: Wed, 15 Apr 2020 15:49:59 +0000 Subject: [gpfsug-discuss] [EXTERNAL] Kernel crashes with Spectrum Scale and RHEL 7.7 3.10.0-1062.18.1.el7 kernel In-Reply-To: References: Message-ID: <8A1DF819-E3B3-4727-8CA6-F26C5BFC09B4@nasa.gov> Will this impact *any* version of Spectrum Scale? -Laurence From: on behalf of Felipe Knop Reply-To: gpfsug main discussion list Date: Wednesday, April 15, 2020 at 11:30 AM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] [gpfsug-discuss] Kernel crashes with Spectrum Scale and RHEL 7.7 3.10.0-1062.18.1.el7 kernel All, A problem has been identified with Spectrum Scale when running on RHEL 7.7 and kernel 3.10.0-1062.18.1.el7. While a fix is being currently developed, customers should not move up to this kernel level. The new kernel was issued on March 17 via the following errata: https://access.redhat.com/errata/RHSA-2020:0834 When this kernel is used with Scale, system crashes have been observed. The following are a couple of examples of kernel stack traces for the crash: [ 2915.625015] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040 [ 2915.633770] IP: [] cxiDropSambaDCacheEntry+0x190/0x1b0 [mmfslinux] [ 2915.914097] [] gpfs_i_rmdir+0x29c/0x310 [mmfslinux] [ 2915.921381] [] ? take_dentry_name_snapshot+0xf0/0xf0 [ 2915.928760] [] ? shrink_dcache_parent+0x60/0x90 [ 2915.935656] [] vfs_rmdir+0xdc/0x150 [ 2915.941388] [] do_rmdir+0x1f1/0x220 [ 2915.947119] [] ? __fput+0x186/0x260 [ 2915.952849] [] ? ____fput+0xe/0x10 [ 2915.958484] [] ? task_work_run+0xc0/0xe0 [ 2915.964701] [] SyS_unlinkat+0x25/0x40 [1224278.495993] [] __dentry_kill+0x128/0x190 [1224278.496678] [] dput+0xb6/0x1a0 [1224278.497378] [] d_prune_aliases+0xb6/0xf0 [1224278.498083] [] cxiPruneDCacheEntry+0x13a/0x1c0 [mmfslinux] [1224278.498798] [] _ZN10gpfsNode_t16invalidateOSNodeEPS_Pvij+0x108/0x350 [mmfs26] RHEL 7.8 is also impacted by the same problem, but validation of Scale with 7.8 is still under way. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 9466 bytes Desc: not available URL: From knop at us.ibm.com Wed Apr 15 17:25:41 2020 From: knop at us.ibm.com (Felipe Knop) Date: Wed, 15 Apr 2020 16:25:41 +0000 Subject: [gpfsug-discuss] Kernel crashes with Spectrum Scale and RHEL 7.7 3.10.0-1062.18.1.el7 kernel In-Reply-To: <8A1DF819-E3B3-4727-8CA6-F26C5BFC09B4@nasa.gov> References: <8A1DF819-E3B3-4727-8CA6-F26C5BFC09B4@nasa.gov>, Message-ID: An HTML attachment was scrubbed... URL: From xhejtman at ics.muni.cz Wed Apr 15 17:35:12 2020 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Wed, 15 Apr 2020 18:35:12 +0200 Subject: [gpfsug-discuss] Kernel crashes with Spectrum Scale and RHEL 7.7 3.10.0-1062.18.1.el7 kernel In-Reply-To: References: <8A1DF819-E3B3-4727-8CA6-F26C5BFC09B4@nasa.gov> Message-ID: <20200415163512.GP30439@ics.muni.cz> And are you sure it is present only in -1062.18.1.el7 kernel? I think it is present in all -1062.* kernels.. On Wed, Apr 15, 2020 at 04:25:41PM +0000, Felipe Knop wrote: > Laurence, > ? > The problem affects all the Scale releases / PTFs. > ? > ? Felipe > ? > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > ? > ? > ? > > ----- Original message ----- > From: "Schuler, Laurence (GSFC-606.4)[ADNET SYSTEMS INC]" > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: Re: [gpfsug-discuss] [EXTERNAL] Kernel crashes with Spectrum > Scale and RHEL 7.7 3.10.0-1062.18.1.el7 kernel > Date: Wed, Apr 15, 2020 12:10 PM > ? > > Will this impact *any* version of Spectrum Scale? > > ? > > -Laurence > > ? > > From: on behalf of Felipe > Knop > Reply-To: gpfsug main discussion list > Date: Wednesday, April 15, 2020 at 11:30 AM > To: "gpfsug-discuss at spectrumscale.org" > > Subject: [EXTERNAL] [gpfsug-discuss] Kernel crashes with Spectrum Scale > and RHEL 7.7 3.10.0-1062.18.1.el7 kernel > > ? > > All, > > ? > > A problem has been identified with Spectrum Scale when running on RHEL > 7.7 and kernel 3.10.0-1062.18.1.el7.? While a fix is being currently > developed, customers should not move up to this kernel level. > > ? > > The new kernel was issued on March 17 via the following errata:? > [1]https://access.redhat.com/errata/RHSA-2020:0834 > > ? > > When this kernel is used with Scale, system crashes have been observed. > The following are a couple of examples of kernel stack traces for the > crash: > > ? > > ? > > [ 2915.625015] BUG: unable to handle kernel NULL pointer dereference at > 0000000000000040 > [ 2915.633770] IP: [] > cxiDropSambaDCacheEntry+0x190/0x1b0 [mmfslinux] > > [ 2915.914097]? [] gpfs_i_rmdir+0x29c/0x310 > [mmfslinux] > [ 2915.921381]? [] ? > take_dentry_name_snapshot+0xf0/0xf0 > [ 2915.928760]? [] ? shrink_dcache_parent+0x60/0x90 > [ 2915.935656]? [] vfs_rmdir+0xdc/0x150 > [ 2915.941388]? [] do_rmdir+0x1f1/0x220 > [ 2915.947119]? [] ? __fput+0x186/0x260 > [ 2915.952849]? [] ? ____fput+0xe/0x10 > [ 2915.958484]? [] ? task_work_run+0xc0/0xe0 > [ 2915.964701]? [] SyS_unlinkat+0x25/0x40 > > ? > > [1224278.495993] [] __dentry_kill+0x128/0x190 > [1224278.496678] [] dput+0xb6/0x1a0 > [1224278.497378] [] d_prune_aliases+0xb6/0xf0 > [1224278.498083] [] cxiPruneDCacheEntry+0x13a/0x1c0 > [mmfslinux] > [1224278.498798] [] > _ZN10gpfsNode_t16invalidateOSNodeEPS_Pvij+0x108/0x350 [mmfs26] > > ? > > ? > > RHEL 7.8 is also impacted by the same problem, but validation of Scale > with 7.8 is still under way. > > ? > > ? > > ? Felipe > > ? > > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > ? > > ? > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > [2]http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ? > > References > > Visible links > 1. https://access.redhat.com/errata/RHSA-2020:0834 > 2. http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title From knop at us.ibm.com Wed Apr 15 17:51:02 2020 From: knop at us.ibm.com (Felipe Knop) Date: Wed, 15 Apr 2020 16:51:02 +0000 Subject: [gpfsug-discuss] Kernel crashes with Spectrum Scale and RHEL 7.7 3.10.0-1062.18.1.el7 kernel In-Reply-To: <20200415163512.GP30439@ics.muni.cz> References: <20200415163512.GP30439@ics.muni.cz>, <8A1DF819-E3B3-4727-8CA6-F26C5BFC09B4@nasa.gov> Message-ID: An HTML attachment was scrubbed... URL: From xhejtman at ics.muni.cz Wed Apr 15 18:06:57 2020 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Wed, 15 Apr 2020 19:06:57 +0200 Subject: [gpfsug-discuss] Kernel crashes with Spectrum Scale and RHEL 7.7 3.10.0-1062.18.1.el7 kernel In-Reply-To: References: <20200415163512.GP30439@ics.muni.cz> <8A1DF819-E3B3-4727-8CA6-F26C5BFC09B4@nasa.gov> Message-ID: <20200415170657.GQ30439@ics.muni.cz> Should I report then or just wait to fix 18.1 problem and see whether older ones are gone as well? On Wed, Apr 15, 2020 at 04:51:02PM +0000, Felipe Knop wrote: > Lukas, > ? > There was one particular kernel change introduced in 3.10.0-1062.18.1 that > has triggered a given set of crashes. It's possible, though, that there is > a lingering problem affecting older levels of 3.10.0-1062. I believe that > crashes occurring on older kernels should be treated as separate problems. > ? > ? Felipe > ? > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > ? > ? > ? > > ----- Original message ----- > From: Lukas Hejtmanek > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: [EXTERNAL] Re: [gpfsug-discuss] Kernel crashes with Spectrum > Scale and RHEL 7.7 3.10.0-1062.18.1.el7 kernel > Date: Wed, Apr 15, 2020 12:35 PM > ? > And are you sure it is present only in -1062.18.1.el7 kernel? I think it > is > present in all -1062.* kernels.. > > On Wed, Apr 15, 2020 at 04:25:41PM +0000, Felipe Knop wrote: > > ? ?Laurence, > > ? ?? > > ? ?The problem affects all the Scale releases / PTFs. > > ? ?? > > ? ?? Felipe > > ? ?? > > ? ?---- > > ? ?Felipe Knop knop at us.ibm.com > > ? ?GPFS Development and Security > > ? ?IBM Systems > > ? ?IBM Building 008 > > ? ?2455 South Rd, Poughkeepsie, NY 12601 > > ? ?(845) 433-9314 T/L 293-9314 > > ? ?? > > ? ?? > > ? ?? > > > > ? ? ?----- Original message ----- > > ? ? ?From: "Schuler, Laurence (GSFC-606.4)[ADNET SYSTEMS INC]" > > ? ? ? > > ? ? ?Sent by: gpfsug-discuss-bounces at spectrumscale.org > > ? ? ?To: gpfsug main discussion list > > > ? ? ?Cc: > > ? ? ?Subject: Re: [gpfsug-discuss] [EXTERNAL] Kernel crashes with > Spectrum > > ? ? ?Scale and RHEL 7.7 3.10.0-1062.18.1.el7 kernel > > ? ? ?Date: Wed, Apr 15, 2020 12:10 PM > > ? ? ?? > > > > ? ? ?Will this impact *any* version of Spectrum Scale? > > > > ? ? ?? > > > > ? ? ?-Laurence > > > > ? ? ?? > > > > ? ? ?From: on behalf of > Felipe > > ? ? ?Knop > > ? ? ?Reply-To: gpfsug main discussion list > > > ? ? ?Date: Wednesday, April 15, 2020 at 11:30 AM > > ? ? ?To: "gpfsug-discuss at spectrumscale.org" > > ? ? ? > > ? ? ?Subject: [EXTERNAL] [gpfsug-discuss] Kernel crashes with Spectrum > Scale > > ? ? ?and RHEL 7.7 3.10.0-1062.18.1.el7 kernel > > > > ? ? ?? > > > > ? ? ?All, > > > > ? ? ?? > > > > ? ? ?A problem has been identified with Spectrum Scale when running on > RHEL > > ? ? ?7.7 and kernel 3.10.0-1062.18.1.el7.? While a fix is being > currently > > ? ? ?developed, customers should not move up to this kernel level. > > > > ? ? ?? > > > > ? ? ?The new kernel was issued on March 17 via the following errata:? > > ? ? ?[1][1]https://access.redhat.com/errata/RHSA-2020:0834? > > > > ? ? ?? > > > > ? ? ?When this kernel is used with Scale, system crashes have been > observed. > > ? ? ?The following are a couple of examples of kernel stack traces for > the > > ? ? ?crash: > > > > ? ? ?? > > > > ? ? ?? > > > > ? ? ?[ 2915.625015] BUG: unable to handle kernel NULL pointer > dereference at > > ? ? ?0000000000000040 > > ? ? ?[ 2915.633770] IP: [] > > ? ? ?cxiDropSambaDCacheEntry+0x190/0x1b0 [mmfslinux] > > > > ? ? ?[ 2915.914097]? [] gpfs_i_rmdir+0x29c/0x310 > > ? ? ?[mmfslinux] > > ? ? ?[ 2915.921381]? [] ? > > ? ? ?take_dentry_name_snapshot+0xf0/0xf0 > > ? ? ?[ 2915.928760]? [] ? > shrink_dcache_parent+0x60/0x90 > > ? ? ?[ 2915.935656]? [] vfs_rmdir+0xdc/0x150 > > ? ? ?[ 2915.941388]? [] do_rmdir+0x1f1/0x220 > > ? ? ?[ 2915.947119]? [] ? __fput+0x186/0x260 > > ? ? ?[ 2915.952849]? [] ? ____fput+0xe/0x10 > > ? ? ?[ 2915.958484]? [] ? task_work_run+0xc0/0xe0 > > ? ? ?[ 2915.964701]? [] SyS_unlinkat+0x25/0x40 > > > > ? ? ?? > > > > ? ? ?[1224278.495993] [] __dentry_kill+0x128/0x190 > > ? ? ?[1224278.496678] [] dput+0xb6/0x1a0 > > ? ? ?[1224278.497378] [] d_prune_aliases+0xb6/0xf0 > > ? ? ?[1224278.498083] [] > cxiPruneDCacheEntry+0x13a/0x1c0 > > ? ? ?[mmfslinux] > > ? ? ?[1224278.498798] [] > > ? ? ?_ZN10gpfsNode_t16invalidateOSNodeEPS_Pvij+0x108/0x350 [mmfs26] > > > > ? ? ?? > > > > ? ? ?? > > > > ? ? ?RHEL 7.8 is also impacted by the same problem, but validation of > Scale > > ? ? ?with 7.8 is still under way. > > > > ? ? ?? > > > > ? ? ?? > > > > ? ? ?? Felipe > > > > ? ? ?? > > > > ? ? ?---- > > ? ? ?Felipe Knop knop at us.ibm.com > > ? ? ?GPFS Development and Security > > ? ? ?IBM Systems > > ? ? ?IBM Building 008 > > ? ? ?2455 South Rd, Poughkeepsie, NY 12601 > > ? ? ?(845) 433-9314 T/L 293-9314 > > ? ? ?? > > > > ? ? ?? > > ? ? ?_______________________________________________ > > ? ? ?gpfsug-discuss mailing list > > ? ? ?gpfsug-discuss at spectrumscale.org > > ? ? ?[2][2]http://gpfsug.org/mailman/listinfo/gpfsug-discuss? > > > > ? ?? > > > > References > > > > ? ?Visible links > > ? ?1. [3]https://access.redhat.com/errata/RHSA-2020:0834? > > ? ?2. [4]http://gpfsug.org/mailman/listinfo/gpfsug-discuss? > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > [5]http://gpfsug.org/mailman/listinfo/gpfsug-discuss? > > -- > Luk?? Hejtm?nek > > Linux Administrator only because > ??Full Time Multitasking Ninja > ??is not an official job title > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > [6]http://gpfsug.org/mailman/listinfo/gpfsug-discuss? > ? > > ? > > References > > Visible links > 1. https://access.redhat.com/errata/RHSA-2020:0834 > 2. http://gpfsug.org/mailman/listinfo/gpfsug-discuss > 3. https://access.redhat.com/errata/RHSA-2020:0834 > 4. http://gpfsug.org/mailman/listinfo/gpfsug-discuss > 5. http://gpfsug.org/mailman/listinfo/gpfsug-discuss > 6. http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title From knop at us.ibm.com Wed Apr 15 19:17:15 2020 From: knop at us.ibm.com (Felipe Knop) Date: Wed, 15 Apr 2020 18:17:15 +0000 Subject: [gpfsug-discuss] Kernel crashes with Spectrum Scale and RHEL 7.7 3.10.0-1062.18.1.el7 kernel In-Reply-To: <20200415170657.GQ30439@ics.muni.cz> References: <20200415170657.GQ30439@ics.muni.cz>, <20200415163512.GP30439@ics.muni.cz><8A1DF819-E3B3-4727-8CA6-F26C5BFC09B4@nasa.gov> Message-ID: An HTML attachment was scrubbed... URL: From dean.flanders at fmi.ch Thu Apr 16 04:26:36 2020 From: dean.flanders at fmi.ch (Flanders, Dean) Date: Thu, 16 Apr 2020 03:26:36 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing Message-ID: Hello All, As IBM has completely switched to capacity based licensing in order to use SS v5 I was wondering how others are dealing with this? We do not find the capacity based licensing sustainable. Our long term plan is to migrate away from SS v5 to Lustre, and based on the Lustre roadmap we have seen it should have the features we need within the next ~1 year (we are fortunate to have good contacts). We would really like to stay with SS/GPFS and have been big advocates of SS/GPFS over the years, but the capacity based licensing is pushing us into evaluating alternatives. I realize this may not be proper to discuss this directly in this email list, so feel free to email directly with your suggestions or your plans. Thanks and kind regards, Dean -------------- next part -------------- An HTML attachment was scrubbed... URL: From heinrich.billich at id.ethz.ch Thu Apr 16 09:16:59 2020 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Thu, 16 Apr 2020 08:16:59 +0000 Subject: [gpfsug-discuss] Mmhealth events longwaiters_found and deadlock_detected Message-ID: <04D32874-9ABD-4591-8C2B-19D596789ED5@id.ethz.ch> Hello, I?m puzzled about the difference between the two mmhealth events longwaiters_found ERROR Detected Spectrum Scale long-waiters and deadlock_detected WARNING The cluster detected a Spectrum Scale filesystem deadlock Especially why the later has level WARNING only while the first has level ERROR? Longwaiters_found is based on the output of ?mmdiag ?deadlock? and occurs much more often on our clusters, while the later probably is triggered by an external event and no internal mmsysmon check? Deadlock detection is handled by mmfsd? Whenever a deadlock is detected some debug data is collected, which is not true for longwaiters_detected. Hm, so why is no deadlock detected whenever mmdiag ?deadlock shows waiting threads? Shouldn?t the severity be the opposite way? Finally: Can we trigger some debug data collection whenever a longwaiters_found event happens ? just getting the output of ?mmdiag ?deadlock? on the single node could give some hints. Without I don?t see any real chance to take any action. Thank you, Heiner -- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From Anna.Greim at de.ibm.com Thu Apr 16 11:55:56 2020 From: Anna.Greim at de.ibm.com (Anna Greim) Date: Thu, 16 Apr 2020 12:55:56 +0200 Subject: [gpfsug-discuss] =?utf-8?q?Mmhealth_events_longwaiters=5Ffound_an?= =?utf-8?q?d=09deadlock=5Fdetected?= In-Reply-To: <04D32874-9ABD-4591-8C2B-19D596789ED5@id.ethz.ch> References: <04D32874-9ABD-4591-8C2B-19D596789ED5@id.ethz.ch> Message-ID: Hi Heiner, I'm not really able to give you insights into the decision of the events' states. Maybe somebody else is able to answer here. But about your triggering debug data collection question, please have a look at this documentation page: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1adv_createscriptforevents.htm This feature is in the product since the 5.0.x versions and should be helpful here. It will trigger your eventsCallback script when the event is raised. One of the script's arguments is the event name. So it is possible to create a script, that checks for the event name longwaiters_found and then triggers a mmdiag --deadlock and write it into a txt file. The script call has a hard time out of 60 seconds so it does not interfere too much with the mmsysmon internals, but better would be a run time less than 1 second. Mit freundlichen Gr??en / Kind regards Anna Greim Software Engineer, Spectrum Scale Development IBM Systems IBM Data Privacy Statement IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Gregor Pillen Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "Billich Heinrich Rainer (ID SD)" To: gpfsug main discussion list Date: 16/04/2020 10:36 Subject: [EXTERNAL] [gpfsug-discuss] Mmhealth events longwaiters_found and deadlock_detected Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello, I?m puzzled about the difference between the two mmhealth events longwaiters_found ERROR Detected Spectrum Scale long-waiters and deadlock_detected WARNING The cluster detected a Spectrum Scale filesystem deadlock Especially why the later has level WARNING only while the first has level ERROR? Longwaiters_found is based on the output of ?mmdiag ?deadlock? and occurs much more often on our clusters, while the later probably is triggered by an external event and no internal mmsysmon check? Deadlock detection is handled by mmfsd? Whenever a deadlock is detected some debug data is collected, which is not true for longwaiters_detected. Hm, so why is no deadlock detected whenever mmdiag ?deadlock shows waiting threads? Shouldn?t the severity be the opposite way? Finally: Can we trigger some debug data collection whenever a longwaiters_found event happens ? just getting the output of ?mmdiag ?deadlock? on the single node could give some hints. Without I don?t see any real chance to take any action. Thank you, Heiner -- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=XLDdnBDnIn497KhM7_npStR6ig1r198VHeSBY1WbuHc&m=QAa_5ZRNpy310ikXZzwunhWU4TGKsH_NWDoYwh57MNo&s=dKWX1clbfClbfJb5yKSzhoNC1aqCbT6-7s1DQdx8CzY&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From carlz at us.ibm.com Thu Apr 16 13:44:14 2020 From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com) Date: Thu, 16 Apr 2020 12:44:14 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction Message-ID: Folks, I need to correct a common misunderstanding that is perpetuated here: > As IBM has completely switched to capacity based licensing in order to use SS v5 For new customers, Scale is priced Per TB (we also have Per PB licenses now for convenience). This transition was completed in January 2019. And for ESS, it is licensed Per Drive with different prices for HDDs and SSDs. Existing customers with Standard sockets can remain on and continue to buy more Standard sockets. There is no plan to end that entitlement. The same applies to customers with Advanced sockets who want to continue with Advanced. In both cases you can upgrade from V4.2 to V5.0 without changing your license metric. This licensing change is not connected to the migration from V4 to V5. However, I do see a lot of confusion around this point, including from my IBM colleagues, possibly because both transitions occurred around roughly the same time period. Regards, Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com ? From dean.flanders at fmi.ch Thu Apr 16 14:00:49 2020 From: dean.flanders at fmi.ch (Flanders, Dean) Date: Thu, 16 Apr 2020 13:00:49 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction In-Reply-To: References: Message-ID: Hello Carl, Yes, for existing IBM direct customers that may have been the case for v4 to v5. However, from my understanding if a customer bought GPFS/SS via DDN, Lenovo, etc. with embedded systems licenses, this is not the case. From my understanding existing customers from DDN, Lenovo, etc. that have v4 with socket based licenses are not entitled v5 licenses socket licenses. Is that a correct understanding? Thanks and kind regards, Dean -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Carl Zetie - carlz at us.ibm.com Sent: Thursday, April 16, 2020 2:44 PM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Spectrum Scale licensing - important correction Folks, I need to correct a common misunderstanding that is perpetuated here: > As IBM has completely switched to capacity based licensing in order to > use SS v5 For new customers, Scale is priced Per TB (we also have Per PB licenses now for convenience). This transition was completed in January 2019. And for ESS, it is licensed Per Drive with different prices for HDDs and SSDs. Existing customers with Standard sockets can remain on and continue to buy more Standard sockets. There is no plan to end that entitlement. The same applies to customers with Advanced sockets who want to continue with Advanced. In both cases you can upgrade from V4.2 to V5.0 without changing your license metric. This licensing change is not connected to the migration from V4 to V5. However, I do see a lot of confusion around this point, including from my IBM colleagues, possibly because both transitions occurred around roughly the same time period. Regards, Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com ? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From eric.wonderley at vt.edu Thu Apr 16 17:32:29 2020 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Thu, 16 Apr 2020 12:32:29 -0400 Subject: [gpfsug-discuss] gpfs filesets question Message-ID: I have filesets setup in a filesystem...looks like: [root at cl005 ~]# mmlsfileset home -L Filesets in file system 'home': Name Id RootInode ParentId Created InodeSpace MaxInodes AllocInodes Comment root 0 3 -- Tue Jun 30 07:54:09 2015 0 402653184 320946176 root fileset hess 1 543733376 0 Tue Jun 13 14:56:13 2017 0 0 0 predictHPC 2 1171116 0 Thu Jan 5 15:16:56 2017 0 0 0 HYCCSIM 3 544258049 0 Wed Jun 14 10:00:41 2017 0 0 0 socialdet 4 544258050 0 Wed Jun 14 10:01:02 2017 0 0 0 arc 5 1171073 0 Thu Jan 5 15:07:09 2017 0 0 0 arcadm 6 1171074 0 Thu Jan 5 15:07:10 2017 0 0 0 I beleive these are dependent filesets. Dependent on the root fileset. Anyhow a user wants to move a large amount of data from one fileset to another. Would this be a metadata only operation? He has attempted to small amount of data and has noticed some thrasing. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Thu Apr 16 18:11:40 2020 From: stockf at us.ibm.com (Frederick Stock) Date: Thu, 16 Apr 2020 17:11:40 +0000 Subject: [gpfsug-discuss] gpfs filesets question In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From eric.wonderley at vt.edu Thu Apr 16 18:36:35 2020 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Thu, 16 Apr 2020 13:36:35 -0400 Subject: [gpfsug-discuss] gpfs filesets question In-Reply-To: References: Message-ID: Hi Fred: I do. I have 3 pools. system, ssd data pool(fc_ssd400G) and a spinning disk pool(fc_8T). I want to think the ssd_data_pool is empty at the moment and the system pool is ssd and only contains metadata. [root at cl005 ~]# mmdf home -P fc_ssd400G disk disk size failure holds holds free KB free KB name in KB group metadata data in full blocks in fragments --------------- ------------- -------- -------- ----- -------------------- ------------------- Disks in storage pool: fc_ssd400G (Maximum disk size allowed is 97 TB) r10f1e8 1924720640 1001 No Yes 1924644864 (100%) 9728 ( 0%) r10f1e7 1924720640 1001 No Yes 1924636672 (100%) 17408 ( 0%) r10f1e6 1924720640 1001 No Yes 1924636672 (100%) 17664 ( 0%) r10f1e5 1924720640 1001 No Yes 1924644864 (100%) 9728 ( 0%) r10f6e8 1924720640 1001 No Yes 1924644864 (100%) 9728 ( 0%) r10f1e9 1924720640 1001 No Yes 1924644864 (100%) 9728 ( 0%) r10f6e9 1924720640 1001 No Yes 1924644864 (100%) 9728 ( 0%) ------------- -------------------- ------------------- (pool total) 13473044480 13472497664 (100%) 83712 ( 0%) More or less empty. Interesting... On Thu, Apr 16, 2020 at 1:11 PM Frederick Stock wrote: > Do you have more than one GPFS storage pool in the system? If you do and > they align with the filesets then that might explain why moving data from > one fileset to another is causing increased IO operations. > > Fred > __________________________________________________ > Fred Stock | IBM Pittsburgh Lab | 720-430-8821 > stockf at us.ibm.com > > > > ----- Original message ----- > From: "J. Eric Wonderley" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: [EXTERNAL] [gpfsug-discuss] gpfs filesets question > Date: Thu, Apr 16, 2020 12:32 PM > > I have filesets setup in a filesystem...looks like: > [root at cl005 ~]# mmlsfileset home -L > Filesets in file system 'home': > Name Id RootInode ParentId Created > InodeSpace MaxInodes AllocInodes Comment > root 0 3 -- Tue Jun 30 > 07:54:09 2015 0 402653184 320946176 root fileset > hess 1 543733376 0 Tue Jun 13 > 14:56:13 2017 0 0 0 > predictHPC 2 1171116 0 Thu Jan 5 > 15:16:56 2017 0 0 0 > HYCCSIM 3 544258049 0 Wed Jun 14 > 10:00:41 2017 0 0 0 > socialdet 4 544258050 0 Wed Jun 14 > 10:01:02 2017 0 0 0 > arc 5 1171073 0 Thu Jan 5 > 15:07:09 2017 0 0 0 > arcadm 6 1171074 0 Thu Jan 5 > 15:07:10 2017 0 0 0 > > I beleive these are dependent filesets. Dependent on the root fileset. > Anyhow a user wants to move a large amount of data from one fileset to > another. Would this be a metadata only operation? He has attempted to > small amount of data and has noticed some thrasing. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Thu Apr 16 18:55:09 2020 From: stockf at us.ibm.com (Frederick Stock) Date: Thu, 16 Apr 2020 17:55:09 +0000 Subject: [gpfsug-discuss] gpfs filesets question In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Thu Apr 16 19:25:33 2020 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 16 Apr 2020 18:25:33 +0000 Subject: [gpfsug-discuss] gpfs filesets question In-Reply-To: References: Message-ID: If my memory serves? any move of files between filesets requires data to be moved, regardless of pool allocation for the files that need to be moved, and regardless if they are dependent filesets are both in the same independent fileset. -Bryan From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of J. Eric Wonderley Sent: Thursday, April 16, 2020 12:37 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] gpfs filesets question [EXTERNAL EMAIL] Hi Fred: I do. I have 3 pools. system, ssd data pool(fc_ssd400G) and a spinning disk pool(fc_8T). I want to think the ssd_data_pool is empty at the moment and the system pool is ssd and only contains metadata. [root at cl005 ~]# mmdf home -P fc_ssd400G disk disk size failure holds holds free KB free KB name in KB group metadata data in full blocks in fragments --------------- ------------- -------- -------- ----- -------------------- ------------------- Disks in storage pool: fc_ssd400G (Maximum disk size allowed is 97 TB) r10f1e8 1924720640 1001 No Yes 1924644864 (100%) 9728 ( 0%) r10f1e7 1924720640 1001 No Yes 1924636672 (100%) 17408 ( 0%) r10f1e6 1924720640 1001 No Yes 1924636672 (100%) 17664 ( 0%) r10f1e5 1924720640 1001 No Yes 1924644864 (100%) 9728 ( 0%) r10f6e8 1924720640 1001 No Yes 1924644864 (100%) 9728 ( 0%) r10f1e9 1924720640 1001 No Yes 1924644864 (100%) 9728 ( 0%) r10f6e9 1924720640 1001 No Yes 1924644864 (100%) 9728 ( 0%) ------------- -------------------- ------------------- (pool total) 13473044480 13472497664 (100%) 83712 ( 0%) More or less empty. Interesting... On Thu, Apr 16, 2020 at 1:11 PM Frederick Stock > wrote: Do you have more than one GPFS storage pool in the system? If you do and they align with the filesets then that might explain why moving data from one fileset to another is causing increased IO operations. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com ----- Original message ----- From: "J. Eric Wonderley" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: [EXTERNAL] [gpfsug-discuss] gpfs filesets question Date: Thu, Apr 16, 2020 12:32 PM I have filesets setup in a filesystem...looks like: [root at cl005 ~]# mmlsfileset home -L Filesets in file system 'home': Name Id RootInode ParentId Created InodeSpace MaxInodes AllocInodes Comment root 0 3 -- Tue Jun 30 07:54:09 2015 0 402653184 320946176 root fileset hess 1 543733376 0 Tue Jun 13 14:56:13 2017 0 0 0 predictHPC 2 1171116 0 Thu Jan 5 15:16:56 2017 0 0 0 HYCCSIM 3 544258049 0 Wed Jun 14 10:00:41 2017 0 0 0 socialdet 4 544258050 0 Wed Jun 14 10:01:02 2017 0 0 0 arc 5 1171073 0 Thu Jan 5 15:07:09 2017 0 0 0 arcadm 6 1171074 0 Thu Jan 5 15:07:10 2017 0 0 0 I beleive these are dependent filesets. Dependent on the root fileset. Anyhow a user wants to move a large amount of data from one fileset to another. Would this be a metadata only operation? He has attempted to small amount of data and has noticed some thrasing. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu Apr 16 17:50:42 2020 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 16 Apr 2020 16:50:42 +0000 Subject: [gpfsug-discuss] gpfs filesets question Message-ID: Moving data between filesets is like moving files between file systems. Normally when you move files between directories, it?s simple metadata, but with filesets (dependent or independent) is a full copy and delete of the old data. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of "J. Eric Wonderley" Reply-To: gpfsug main discussion list Date: Thursday, April 16, 2020 at 11:32 AM To: gpfsug main discussion list Subject: [EXTERNAL] [gpfsug-discuss] gpfs filesets question I have filesets setup in a filesystem...looks like: [root at cl005 ~]# mmlsfileset home -L Filesets in file system 'home': Name Id RootInode ParentId Created InodeSpace MaxInodes AllocInodes Comment root 0 3 -- Tue Jun 30 07:54:09 2015 0 402653184 320946176 root fileset hess 1 543733376 0 Tue Jun 13 14:56:13 2017 0 0 0 predictHPC 2 1171116 0 Thu Jan 5 15:16:56 2017 0 0 0 HYCCSIM 3 544258049 0 Wed Jun 14 10:00:41 2017 0 0 0 socialdet 4 544258050 0 Wed Jun 14 10:01:02 2017 0 0 0 arc 5 1171073 0 Thu Jan 5 15:07:09 2017 0 0 0 arcadm 6 1171074 0 Thu Jan 5 15:07:10 2017 0 0 0 I beleive these are dependent filesets. Dependent on the root fileset. Anyhow a user wants to move a large amount of data from one fileset to another. Would this be a metadata only operation? He has attempted to small amount of data and has noticed some thrasing. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carlz at us.ibm.com Thu Apr 16 21:24:51 2020 From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com) Date: Thu, 16 Apr 2020 20:24:51 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction Message-ID: > From my understanding existing customers from DDN, Lenovo, etc. that have v4 with socket based licenses >are not entitled v5 licenses socket licenses. Is that a correct understanding? It is not, and I apologize in advance for the length of this explanation. I want to be precise and as transparent as possible while respecting the confidentiality of our OEM partners and the contracts we have with them, and there is a lot of misinformation out there. The short version is that the same rules apply to DDN, Lenovo, and other OEM systems that apply to IBM ESS. You can update your system in place and keep your existing metric, as long as your vendor can supply you with V5 for that hardware. The update from V4 to V5 is not relevant. The long version: We apply the same standard to our OEM's systems as to our own ESS: they can upgrade their existing customers on their existing OEM systems to V5 and stay on Sockets, *provided* that the OEM has entered into an OEM license for Scale V5 and can supply it, and *provided* that the hardware is still supported by the software stack. But new customers and new OEM systems are all licensed by Capacity. This also applies to IBM's own ESS: you can keep upgrading your old (if hardware is supported) gen 1 ESS on Sockets, but if you replace it with a new ESS, that will come with capacity licenses. (Lenovo may want to chime in about their own GSS customers here, who have Socket licenses, and DSS-G customers, who have Capacity licenses). Existing systems that originally shipped with Socket licenses are "grandfathered in". And of course, if you move from a Lenovo system to an IBM system, or from an IBM system to a Lenovo system, or any other change of suppliers, that new system will come with capacity licenses, simply because it's a new system. If you're replacing an old system running with V4 with a new one running V5 it might look like you are forced to switch to update, but that's not the case: if you replace an old "grandfathered in" system that you had already updated to V5 on Sockets, your new system would *still* come with Capacity licenses - again, because it's a new system. Now where much of the confusion occurs is this: What if your supplier does not provide an update to V5 at all, *neither as Capacity nor Socket licenses*? Then you have no choice: to get to V5, you have to move to a new supplier, and consequently you have to move to Capacity licensing. But once again, it's not that moving from V4 to V5 requires a change of metric; it's moving to a new system from a new supplier. I hope that helps to make things clearer. Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com From mhennecke at lenovo.com Thu Apr 16 22:19:13 2020 From: mhennecke at lenovo.com (Michael Hennecke) Date: Thu, 16 Apr 2020 21:19:13 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing - Lenovo information Message-ID: Hi, Thanks a lot Carl for these clarifications. Some additions from the Lenovo side: Lenovo *GSS* (which is no longer sold, but still fully supported) uses the socked-based Spectrum Scale Standard Edition or Advanced Edition. We provide both a 4.2 based version and a 5.0 based version of the GSS installation packages. Customers get access to the Edition they acquired with their GSS system(s), and they can choose to install the 4.2 or the 5.0 code. Lenovo GSS customers are automatically entitled for those GSS downloads. Customers who acquired a GSS system when System x was still part of IBM can also obtain the latest GSS installation packages from Lenovo (v4 and v5), but will need to provide a valid proof of entitlement of their Spectrum Scale licenses before being granted access. Lenovo *DSS-G* uses capacity-based licensing (per-disk or per-TB), with the Spectrum Scale Data Access Edition or Data Management Edition. For DSS-G we also provide both a 4.2 based installation package and a 5.0 based installation package, and customers can choose which one to install. Note that the Lenovo installation tarballs for DSS-G are named for example "dss-g-2.6a-standard-5.0.tgz" (installation package includes the capacity-based DAE) or "dss-g-2.6a-advanced-5.0.tgz" (installation package includes the capacity-based DME), so the Lenovo naming convention for the DSS-G packages is not identical with the naming of the Scale Edition that it includes. PS: There is no path to change a GSS system from a socket license to a capacity license. Replacing it with a DSS-G will of course also replace the licenses, as DSS-G comes with capacity-based licenses. Mit freundlichen Gr?ssen / Best regards, Michael Hennecke HPC Chief Technologist - HPC and AI Business Unit? -- Lenovo Global Technology (Germany) GmbH * Am Zehnthof 77 * D-45307 Essen * Germany Gesch?ftsf?hrung: Colm Gleeson, Christophe Laurent * Sitz der Gesellschaft: Stuttgart * HRB-Nr.: 758298, AG Stuttgart -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Carl Zetie - carlz at us.ibm.com Sent: Thursday, April 16, 2020 10:25 PM To: gpfsug-discuss at spectrumscale.org Subject: [External] Re: [gpfsug-discuss] Spectrum Scale licensing - important correction > From my understanding existing customers from DDN, Lenovo, etc. that >have v4 with socket based licenses are not entitled v5 licenses socket licenses. Is that a correct understanding? It is not, and I apologize in advance for the length of this explanation. I want to be precise and as transparent as possible while respecting the confidentiality of our OEM partners and the contracts we have with them, and there is a lot of misinformation out there. The short version is that the same rules apply to DDN, Lenovo, and other OEM systems that apply to IBM ESS. You can update your system in place and keep your existing metric, as long as your vendor can supply you with V5 for that hardware. The update from V4 to V5 is not relevant. The long version: We apply the same standard to our OEM's systems as to our own ESS: they can upgrade their existing customers on their existing OEM systems to V5 and stay on Sockets, *provided* that the OEM has entered into an OEM license for Scale V5 and can supply it, and *provided* that the hardware is still supported by the software stack. But new customers and new OEM systems are all licensed by Capacity. This also applies to IBM's own ESS: you can keep upgrading your old (if hardware is supported) gen 1 ESS on Sockets, but if you replace it with a new ESS, that will come with capacity licenses. (Lenovo may want to chime in about their own GSS customers here, who have Socket licenses, and DSS-G customers, who have Capacity licenses). Existing systems that originally shipped with Socket licenses are "grandfathered in". And of course, if you move from a Lenovo system to an IBM system, or from an IBM system to a Lenovo system, or any other change of suppliers, that new system will come with capacity licenses, simply because it's a new system. If you're replacing an old system running with V4 with a new one running V5 it might look like you are forced to switch to update, but that's not the case: if you replace an old "grandfathered in" system that you had already updated to V5 on Sockets, your new system would *still* come with Capacity licenses - again, because it's a new system. Now where much of the confusion occurs is this: What if your supplier does not provide an update to V5 at all, *neither as Capacity nor Socket licenses*? Then you have no choice: to get to V5, you have to move to a new supplier, and consequently you have to move to Capacity licensing. But once again, it's not that moving from V4 to V5 requires a change of metric; it's moving to a new system from a new supplier. I hope that helps to make things clearer. Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Greg.Lehmann at csiro.au Fri Apr 17 00:48:18 2020 From: Greg.Lehmann at csiro.au (Lehmann, Greg (IM&T, Pullenvale)) Date: Thu, 16 Apr 2020 23:48:18 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing In-Reply-To: References: Message-ID: Plus one. It is not just volume licensing. The socket licensing costs have gone through the roof, at least in Australia. IBM tempts you with a cheap introduction and then once you are hooked, ramps up the price. They are counting on the migration costs outweighing the licensing fee increases. Unfortunately, our management won't stand for this business approach, so we get to do the migrations (boring as the proverbial bat ... you know what.) I think this forum is a good place to discuss it. IBM and customers on here need to know all about it. It is a user group after all and moving away from a product is part of the lifecycle. We were going to use GPFS for HPC scratch but went to market and ended up with BeeGFS. Further pricing pressure has meant GPFS is being phased out in all areas. We split our BeeGFS cluster of NVMe servers in half on arrival and have been trying other filesystems on half of it. We were going to try GPFS ECE but given the pricing we have been quoted have decided not to waste our time. We are gearing up to try Lustre on it. We have also noted the feature improvements with Lustre. Maybe if IBM had saved the money that a rebranding costs (GPFS to Spectrum Scale) they would not have had to crank up the price of GPFS? Cheers, Greg From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Flanders, Dean Sent: Thursday, April 16, 2020 1:27 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Spectrum Scale licensing Hello All, As IBM has completely switched to capacity based licensing in order to use SS v5 I was wondering how others are dealing with this? We do not find the capacity based licensing sustainable. Our long term plan is to migrate away from SS v5 to Lustre, and based on the Lustre roadmap we have seen it should have the features we need within the next ~1 year (we are fortunate to have good contacts). We would really like to stay with SS/GPFS and have been big advocates of SS/GPFS over the years, but the capacity based licensing is pushing us into evaluating alternatives. I realize this may not be proper to discuss this directly in this email list, so feel free to email directly with your suggestions or your plans. Thanks and kind regards, Dean -------------- next part -------------- An HTML attachment was scrubbed... URL: From dean.flanders at fmi.ch Fri Apr 17 01:40:22 2020 From: dean.flanders at fmi.ch (Flanders, Dean) Date: Fri, 17 Apr 2020 00:40:22 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction In-Reply-To: References: Message-ID: Hello Carl, Thanks for the clarification. I have always heard the term "existing customers" so originally I thought we were fine, but this is the first time I have seen the term "existing systems". However, it seems what I said before is mostly correct, eventually all customers will be forced to capacity based licensing as they life cycle hardware (even IBM customers). In addition it seems there is a diminishing number of OEMs that can sell SS v5, which is what happened in our case when we wanted to go from v4 to v5 with existing hardware (in our case DDN). So I strongly encourage organizations to be thinking of these issues in their long term planning. Thanks and kind regards, Dean -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Carl Zetie - carlz at us.ibm.com Sent: Thursday, April 16, 2020 10:25 PM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Spectrum Scale licensing - important correction > From my understanding existing customers from DDN, Lenovo, etc. that >have v4 with socket based licenses are not entitled v5 licenses socket licenses. Is that a correct understanding? It is not, and I apologize in advance for the length of this explanation. I want to be precise and as transparent as possible while respecting the confidentiality of our OEM partners and the contracts we have with them, and there is a lot of misinformation out there. The short version is that the same rules apply to DDN, Lenovo, and other OEM systems that apply to IBM ESS. You can update your system in place and keep your existing metric, as long as your vendor can supply you with V5 for that hardware. The update from V4 to V5 is not relevant. The long version: We apply the same standard to our OEM's systems as to our own ESS: they can upgrade their existing customers on their existing OEM systems to V5 and stay on Sockets, *provided* that the OEM has entered into an OEM license for Scale V5 and can supply it, and *provided* that the hardware is still supported by the software stack. But new customers and new OEM systems are all licensed by Capacity. This also applies to IBM's own ESS: you can keep upgrading your old (if hardware is supported) gen 1 ESS on Sockets, but if you replace it with a new ESS, that will come with capacity licenses. (Lenovo may want to chime in about their own GSS customers here, who have Socket licenses, and DSS-G customers, who have Capacity licenses). Existing systems that originally shipped with Socket licenses are "grandfathered in". And of course, if you move from a Lenovo system to an IBM system, or from an IBM system to a Lenovo system, or any other change of suppliers, that new system will come with capacity licenses, simply because it's a new system. If you're replacing an old system running with V4 with a new one running V5 it might look like you are forced to switch to update, but that's not the case: if you replace an old "grandfathered in" system that you had already updated to V5 on Sockets, your new system would *still* come with Capacity licenses - again, because it's a new system. Now where much of the confusion occurs is this: What if your supplier does not provide an update to V5 at all, *neither as Capacity nor Socket licenses*? Then you have no choice: to get to V5, you have to move to a new supplier, and consequently you have to move to Capacity licensing. But once again, it's not that moving from V4 to V5 requires a change of metric; it's moving to a new system from a new supplier. I hope that helps to make things clearer. Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From sedl at re-store.net Fri Apr 17 03:06:57 2020 From: sedl at re-store.net (Michael Sedlmayer) Date: Fri, 17 Apr 2020 02:06:57 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction In-Reply-To: References: Message-ID: One more important distinction with the DDN installations. Most DDN systems were deployed with an OEM license of GPFS v4. That license allowed DDN to use GPFS on their hardware appliance, but and didn't ever equate to an IBM software license. To my knowledge, DDN has not been a reseller of IBM licenses. We've had a lot of issues where our DDN users wanted to upgrade to Spectrum Scale 5; DDN couldn't provide the licensed code; and the user learned that they really didn't own IBM software (just the right to use the software on their DDN system) -michael Michael Sedlmayer -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Flanders, Dean Sent: Thursday, April 16, 2020 5:40 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Spectrum Scale licensing - important correction Hello Carl, Thanks for the clarification. I have always heard the term "existing customers" so originally I thought we were fine, but this is the first time I have seen the term "existing systems". However, it seems what I said before is mostly correct, eventually all customers will be forced to capacity based licensing as they life cycle hardware (even IBM customers). In addition it seems there is a diminishing number of OEMs that can sell SS v5, which is what happened in our case when we wanted to go from v4 to v5 with existing hardware (in our case DDN). So I strongly encourage organizations to be thinking of these issues in their long term planning. Thanks and kind regards, Dean -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Carl Zetie - carlz at us.ibm.com Sent: Thursday, April 16, 2020 10:25 PM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Spectrum Scale licensing - important correction > From my understanding existing customers from DDN, Lenovo, etc. that >have v4 with socket based licenses are not entitled v5 licenses socket licenses. Is that a correct understanding? It is not, and I apologize in advance for the length of this explanation. I want to be precise and as transparent as possible while respecting the confidentiality of our OEM partners and the contracts we have with them, and there is a lot of misinformation out there. The short version is that the same rules apply to DDN, Lenovo, and other OEM systems that apply to IBM ESS. You can update your system in place and keep your existing metric, as long as your vendor can supply you with V5 for that hardware. The update from V4 to V5 is not relevant. The long version: We apply the same standard to our OEM's systems as to our own ESS: they can upgrade their existing customers on their existing OEM systems to V5 and stay on Sockets, *provided* that the OEM has entered into an OEM license for Scale V5 and can supply it, and *provided* that the hardware is still supported by the software stack. But new customers and new OEM systems are all licensed by Capacity. This also applies to IBM's own ESS: you can keep upgrading your old (if hardware is supported) gen 1 ESS on Sockets, but if you replace it with a new ESS, that will come with capacity licenses. (Lenovo may want to chime in about their own GSS customers here, who have Socket licenses, and DSS-G customers, who have Capacity licenses). Existing systems that originally shipped with Socket licenses are "grandfathered in". And of course, if you move from a Lenovo system to an IBM system, or from an IBM system to a Lenovo system, or any other change of suppliers, that new system will come with capacity licenses, simply because it's a new system. If you're replacing an old system running with V4 with a new one running V5 it might look like you are forced to switch to update, but that's not the case: if you replace an old "grandfathered in" system that you had already updated to V5 on Sockets, your new system would *still* come with Capacity licenses - again, because it's a new system. Now where much of the confusion occurs is this: What if your supplier does not provide an update to V5 at all, *neither as Capacity nor Socket licenses*? Then you have no choice: to get to V5, you have to move to a new supplier, and consequently you have to move to Capacity licensing. But once again, it's not that moving from V4 to V5 requires a change of metric; it's moving to a new system from a new supplier. I hope that helps to make things clearer. Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From steve.hindmarsh at crick.ac.uk Fri Apr 17 08:35:51 2020 From: steve.hindmarsh at crick.ac.uk (Steve Hindmarsh) Date: Fri, 17 Apr 2020 07:35:51 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction In-Reply-To: References: , Message-ID: <26D9CF79-4118-4F5A-AF96-CA72AA1EE712@crick.ac.uk> We are caught in the same position (12 PB on DDN GridScaler) and currently unable to upgrade to v5. If the position between IBM and DDN can?t be resolved, an extension of meaningful support from IBM (i.e. critical patches not just a sympathetic ear) for OEM licences would make a *huge* difference to those of us who need to provide critical production research data services on current equipment for another few years at least - with appropriate paid vendor support of course. Best, Steve Steve Hindmarsh Head of Scientific Computing The Francis Crick Institute Sent from my mobile On 17 Apr 2020, at 03:07, Michael Sedlmayer wrote: ?One more important distinction with the DDN installations. Most DDN systems were deployed with an OEM license of GPFS v4. That license allowed DDN to use GPFS on their hardware appliance, but and didn't ever equate to an IBM software license. To my knowledge, DDN has not been a reseller of IBM licenses. We've had a lot of issues where our DDN users wanted to upgrade to Spectrum Scale 5; DDN couldn't provide the licensed code; and the user learned that they really didn't own IBM software (just the right to use the software on their DDN system) -michael Michael Sedlmayer -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Flanders, Dean Sent: Thursday, April 16, 2020 5:40 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Spectrum Scale licensing - important correction Hello Carl, Thanks for the clarification. I have always heard the term "existing customers" so originally I thought we were fine, but this is the first time I have seen the term "existing systems". However, it seems what I said before is mostly correct, eventually all customers will be forced to capacity based licensing as they life cycle hardware (even IBM customers). In addition it seems there is a diminishing number of OEMs that can sell SS v5, which is what happened in our case when we wanted to go from v4 to v5 with existing hardware (in our case DDN). So I strongly encourage organizations to be thinking of these issues in their long term planning. Thanks and kind regards, Dean -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Carl Zetie - carlz at us.ibm.com Sent: Thursday, April 16, 2020 10:25 PM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Spectrum Scale licensing - important correction From my understanding existing customers from DDN, Lenovo, etc. that have v4 with socket based licenses are not entitled v5 licenses socket licenses. Is that a correct understanding? It is not, and I apologize in advance for the length of this explanation. I want to be precise and as transparent as possible while respecting the confidentiality of our OEM partners and the contracts we have with them, and there is a lot of misinformation out there. The short version is that the same rules apply to DDN, Lenovo, and other OEM systems that apply to IBM ESS. You can update your system in place and keep your existing metric, as long as your vendor can supply you with V5 for that hardware. The update from V4 to V5 is not relevant. The long version: We apply the same standard to our OEM's systems as to our own ESS: they can upgrade their existing customers on their existing OEM systems to V5 and stay on Sockets, *provided* that the OEM has entered into an OEM license for Scale V5 and can supply it, and *provided* that the hardware is still supported by the software stack. But new customers and new OEM systems are all licensed by Capacity. This also applies to IBM's own ESS: you can keep upgrading your old (if hardware is supported) gen 1 ESS on Sockets, but if you replace it with a new ESS, that will come with capacity licenses. (Lenovo may want to chime in about their own GSS customers here, who have Socket licenses, and DSS-G customers, who have Capacity licenses). Existing systems that originally shipped with Socket licenses are "grandfathered in". And of course, if you move from a Lenovo system to an IBM system, or from an IBM system to a Lenovo system, or any other change of suppliers, that new system will come with capacity licenses, simply because it's a new system. If you're replacing an old system running with V4 with a new one running V5 it might look like you are forced to switch to update, but that's not the case: if you replace an old "grandfathered in" system that you had already updated to V5 on Sockets, your new system would *still* come with Capacity licenses - again, because it's a new system. Now where much of the confusion occurs is this: What if your supplier does not provide an update to V5 at all, *neither as Capacity nor Socket licenses*? Then you have no choice: to get to V5, you have to move to a new supplier, and consequently you have to move to Capacity licensing. But once again, it's not that moving from V4 to V5 requires a change of metric; it's moving to a new system from a new supplier. I hope that helps to make things clearer. Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7C%7C75b2fc2faa9347d0fcbd08d7e274083b%7C4eed7807ebad415aa7a99170947f4eae%7C0%7C1%7C637226860297343792&sdata=CSG%2FpHcDNU3ZAyp2hdI4oZXCO20UtUmUuzLe05uqRiI%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7C%7C75b2fc2faa9347d0fcbd08d7e274083b%7C4eed7807ebad415aa7a99170947f4eae%7C0%7C1%7C637226860297343792&sdata=CSG%2FpHcDNU3ZAyp2hdI4oZXCO20UtUmUuzLe05uqRiI%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7C%7C75b2fc2faa9347d0fcbd08d7e274083b%7C4eed7807ebad415aa7a99170947f4eae%7C0%7C1%7C637226860297343792&sdata=CSG%2FpHcDNU3ZAyp2hdI4oZXCO20UtUmUuzLe05uqRiI%3D&reserved=0 The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Fri Apr 17 09:19:39 2020 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Fri, 17 Apr 2020 08:19:39 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction In-Reply-To: <26D9CF79-4118-4F5A-AF96-CA72AA1EE712@crick.ac.uk> References: , , <26D9CF79-4118-4F5A-AF96-CA72AA1EE712@crick.ac.uk> Message-ID: Especially with the pandemic. No one is exactly sure what next year?s budget is going to look like. I wouldn?t expect to be buying large amounts of storage to replace so far perfectly good storage. -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Apr 17, 2020, at 03:36, Steve Hindmarsh wrote: ? We are caught in the same position (12 PB on DDN GridScaler) and currently unable to upgrade to v5. If the position between IBM and DDN can?t be resolved, an extension of meaningful support from IBM (i.e. critical patches not just a sympathetic ear) for OEM licences would make a *huge* difference to those of us who need to provide critical production research data services on current equipment for another few years at least - with appropriate paid vendor support of course. Best, Steve Steve Hindmarsh Head of Scientific Computing The Francis Crick Institute Sent from my mobile On 17 Apr 2020, at 03:07, Michael Sedlmayer wrote: ?One more important distinction with the DDN installations. Most DDN systems were deployed with an OEM license of GPFS v4. That license allowed DDN to use GPFS on their hardware appliance, but and didn't ever equate to an IBM software license. To my knowledge, DDN has not been a reseller of IBM licenses. We've had a lot of issues where our DDN users wanted to upgrade to Spectrum Scale 5; DDN couldn't provide the licensed code; and the user learned that they really didn't own IBM software (just the right to use the software on their DDN system) -michael Michael Sedlmayer -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Flanders, Dean Sent: Thursday, April 16, 2020 5:40 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Spectrum Scale licensing - important correction Hello Carl, Thanks for the clarification. I have always heard the term "existing customers" so originally I thought we were fine, but this is the first time I have seen the term "existing systems". However, it seems what I said before is mostly correct, eventually all customers will be forced to capacity based licensing as they life cycle hardware (even IBM customers). In addition it seems there is a diminishing number of OEMs that can sell SS v5, which is what happened in our case when we wanted to go from v4 to v5 with existing hardware (in our case DDN). So I strongly encourage organizations to be thinking of these issues in their long term planning. Thanks and kind regards, Dean -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Carl Zetie - carlz at us.ibm.com Sent: Thursday, April 16, 2020 10:25 PM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Spectrum Scale licensing - important correction From my understanding existing customers from DDN, Lenovo, etc. that have v4 with socket based licenses are not entitled v5 licenses socket licenses. Is that a correct understanding? It is not, and I apologize in advance for the length of this explanation. I want to be precise and as transparent as possible while respecting the confidentiality of our OEM partners and the contracts we have with them, and there is a lot of misinformation out there. The short version is that the same rules apply to DDN, Lenovo, and other OEM systems that apply to IBM ESS. You can update your system in place and keep your existing metric, as long as your vendor can supply you with V5 for that hardware. The update from V4 to V5 is not relevant. The long version: We apply the same standard to our OEM's systems as to our own ESS: they can upgrade their existing customers on their existing OEM systems to V5 and stay on Sockets, *provided* that the OEM has entered into an OEM license for Scale V5 and can supply it, and *provided* that the hardware is still supported by the software stack. But new customers and new OEM systems are all licensed by Capacity. This also applies to IBM's own ESS: you can keep upgrading your old (if hardware is supported) gen 1 ESS on Sockets, but if you replace it with a new ESS, that will come with capacity licenses. (Lenovo may want to chime in about their own GSS customers here, who have Socket licenses, and DSS-G customers, who have Capacity licenses). Existing systems that originally shipped with Socket licenses are "grandfathered in". And of course, if you move from a Lenovo system to an IBM system, or from an IBM system to a Lenovo system, or any other change of suppliers, that new system will come with capacity licenses, simply because it's a new system. If you're replacing an old system running with V4 with a new one running V5 it might look like you are forced to switch to update, but that's not the case: if you replace an old "grandfathered in" system that you had already updated to V5 on Sockets, your new system would *still* come with Capacity licenses - again, because it's a new system. Now where much of the confusion occurs is this: What if your supplier does not provide an update to V5 at all, *neither as Capacity nor Socket licenses*? Then you have no choice: to get to V5, you have to move to a new supplier, and consequently you have to move to Capacity licensing. But once again, it's not that moving from V4 to V5 requires a change of metric; it's moving to a new system from a new supplier. I hope that helps to make things clearer. Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7C%7C75b2fc2faa9347d0fcbd08d7e274083b%7C4eed7807ebad415aa7a99170947f4eae%7C0%7C1%7C637226860297343792&sdata=CSG%2FpHcDNU3ZAyp2hdI4oZXCO20UtUmUuzLe05uqRiI%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7C%7C75b2fc2faa9347d0fcbd08d7e274083b%7C4eed7807ebad415aa7a99170947f4eae%7C0%7C1%7C637226860297343792&sdata=CSG%2FpHcDNU3ZAyp2hdI4oZXCO20UtUmUuzLe05uqRiI%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7C%7C75b2fc2faa9347d0fcbd08d7e274083b%7C4eed7807ebad415aa7a99170947f4eae%7C0%7C1%7C637226860297343792&sdata=CSG%2FpHcDNU3ZAyp2hdI4oZXCO20UtUmUuzLe05uqRiI%3D&reserved=0 The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.horton at icr.ac.uk Fri Apr 17 10:29:52 2020 From: robert.horton at icr.ac.uk (Robert Horton) Date: Fri, 17 Apr 2020 09:29:52 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction In-Reply-To: <26D9CF79-4118-4F5A-AF96-CA72AA1EE712@crick.ac.uk> References: , <26D9CF79-4118-4F5A-AF96-CA72AA1EE712@crick.ac.uk> Message-ID: We're in the same boat. I'm not sure what the issue is between DDN and IBM (although I've heard various rumors) but I really wish they would sort something out. Rob On Fri, 2020-04-17 at 07:35 +0000, Steve Hindmarsh wrote: > CAUTION: This email originated from outside of the ICR. Do not click > links or open attachments unless you recognize the sender's email > address and know the content is safe. > > We are caught in the same position (12 PB on DDN GridScaler) and > currently unable to upgrade to v5. > > If the position between IBM and DDN can?t be resolved, an extension > of meaningful support from IBM (i.e. critical patches not just a > sympathetic ear) for OEM licences would make a *huge* difference to > those of us who need to provide critical production research data > services on current equipment for another few years at least - with > appropriate paid vendor support of course. > > Best, > Steve > > Steve Hindmarsh > Head of Scientific Computing > The Francis Crick Institute -- Robert Horton | Research Data Storage Lead The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk | Twitter @ICR_London Facebook: www.facebook.com/theinstituteofcancerresearch The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network. From yeep at robust.my Fri Apr 17 11:31:49 2020 From: yeep at robust.my (T.A. Yeep) Date: Fri, 17 Apr 2020 18:31:49 +0800 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction In-Reply-To: References: Message-ID: Hi Carl, I'm confused here, in the previous email it was said *And for ESS, it is licensed Per Drive with different prices for HDDs and SSDs.* But then you mentioned in below email that: But new customers and new OEM systems are *all licensed by Capacity. This also applies to IBM's own ESS*: you can keep upgrading your old (if hardware is supported) gen 1 ESS on Sockets, but if you replace it with *a new ESS, that will come with capacity licenses*. Now the question, ESS is license per Drive or by capacity? .On Fri, Apr 17, 2020 at 4:25 AM Carl Zetie - carlz at us.ibm.com < carlz at us.ibm.com> wrote: > > From my understanding existing customers from DDN, Lenovo, etc. that > have v4 with socket based licenses > >are not entitled v5 licenses socket licenses. Is that a correct > understanding? > > It is not, and I apologize in advance for the length of this explanation. > I want to be precise and as transparent as possible while respecting the > confidentiality of our OEM partners and the contracts we have with them, > and there is a lot of misinformation out there. > > The short version is that the same rules apply to DDN, Lenovo, and other > OEM systems that apply to IBM ESS. You can update your system in place and > keep your existing metric, as long as your vendor can supply you with V5 > for that hardware. The update from V4 to V5 is not relevant. > > > The long version: > > We apply the same standard to our OEM's systems as to our own ESS: they > can upgrade their existing customers on their existing OEM systems to V5 > and stay on Sockets, *provided* that the OEM has entered into an OEM > license for Scale V5 and can supply it, and *provided* that the hardware is > still supported by the software stack. But new customers and new OEM > systems are all licensed by Capacity. This also applies to IBM's own ESS: > you can keep upgrading your old (if hardware is supported) gen 1 ESS on > Sockets, but if you replace it with a new ESS, that will come with capacity > licenses. (Lenovo may want to chime in about their own GSS customers here, > who have Socket licenses, and DSS-G customers, who have Capacity licenses). > Existing systems that originally shipped with Socket licenses are > "grandfathered in". > > And of course, if you move from a Lenovo system to an IBM system, or from > an IBM system to a Lenovo system, or any other change of suppliers, that > new system will come with capacity licenses, simply because it's a new > system. If you're replacing an old system running with V4 with a new one > running V5 it might look like you are forced to switch to update, but > that's not the case: if you replace an old "grandfathered in" system that > you had already updated to V5 on Sockets, your new system would *still* > come with Capacity licenses - again, because it's a new system. > > Now where much of the confusion occurs is this: What if your supplier does > not provide an update to V5 at all, *neither as Capacity nor Socket > licenses*? Then you have no choice: to get to V5, you have to move to a new > supplier, and consequently you have to move to Capacity licensing. But once > again, it's not that moving from V4 to V5 requires a change of metric; it's > moving to a new system from a new supplier. > > I hope that helps to make things clearer. > > > > Carl Zetie > Program Director > Offering Management > Spectrum Scale > ---- > (919) 473 3318 ][ Research Triangle Park > carlz at us.ibm.com > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Best regards *T.A. Yeep*Mobile: 016-719 8506 | Tel/Fax: 03-6261 7237 | www.robusthpc.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Fri Apr 17 11:50:22 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 17 Apr 2020 11:50:22 +0100 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction In-Reply-To: References: Message-ID: On 17/04/2020 11:31, T.A. Yeep wrote: > Hi Carl, > > I'm confused here, in the previous email it was said *And for ESS, it is > licensed?Per Drive with different prices for HDDs and SSDs.* > > But then you mentioned in below email that: > But new customers and new OEM systems are *all licensed by Capacity. > This also applies to IBM's own ESS*: you can keep upgrading your old (if > hardware is supported) gen 1 ESS on Sockets, but if you replace it with > *a new ESS, that will come with capacity licenses*. > > Now the question, ESS is license per Drive or by capacity? > Well by drive is "capacity" based licensing unless you have some sort of magical infinite capacity drives :-) Under the PVU scheme if you know what you are doing you could game the system. For example get a handful of servers get PVU licenses for them create a GPFS file system handing off the back using say Fibre Channel and cheap FC attached arrays (Dell MD3000 series springs to mind) and then hang many PB off the back. I could using this scheme create a 100PB filesystem for under a thousand PVU of GPFS server licenses. Add in another cluster for protocol nodes and if you are not mounting on HPC nodes that's a winner :-) In a similar manner I use a pimped out ancient Dell R300 with dual core Xeon for backing up my GPFS filesystem because it's 100PVU of TSM licensing and I am cheap, and besides it is more than enough grunt for the job. A new machine would be 240 PVU minimum (4*70). I plan on replacing the PERC SAS6 card with a H710 and new internal cabling to run RHEL8 :-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From jonathan.buzzard at strath.ac.uk Fri Apr 17 12:02:44 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 17 Apr 2020 12:02:44 +0100 Subject: [gpfsug-discuss] Spectrum Scale licensing In-Reply-To: References: Message-ID: On 16/04/2020 04:26, Flanders, Dean wrote: > Hello All, > > As IBM has completely switched to capacity based licensing in order to > use SS v5 I was wondering how others are dealing with this? We do not > find the capacity based licensing sustainable. Our long term plan is to > migrate away from SS v5 to Lustre, and based on the Lustre roadmap we > have seen it should have the features we need within the next ~1 year > (we are fortunate to have good contacts). The problem is the features of Lustre that are missing in GPFS :-) For example have they removed the Lustre feature where roughly biannually the metadata server kernel panics introducing incorrectable corruption into the file system that will within six months cause constant crashes of the metadata node to the point where the file system is unusable? In best slashdot car analogy GPFS is like driving round in a Aston Martin DB9, where Lustre is like having a Ford Pinto. You will never be happy with Pinto in my experience having gone from the DB9 to the Pinto and back to the DB9. That said if you use Lustre as a high performance scratch file system fro HPC and every ~6 months do a shutdown and upgrade, and at the same time reformat your Lustre file system you will be fine. Our experience with Lustre was so bad we specifically excluded it as an option for our current HPC system when it went out to tender. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From carlz at us.ibm.com Fri Apr 17 13:10:00 2020 From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com) Date: Fri, 17 Apr 2020 12:10:00 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction Message-ID: <19F21F2C-901E-4A04-AB94-740E2C2B5205@us.ibm.com> >Now the question, ESS is license per Drive or by capacity? I apologize for the confusion. Within IBM Storage when we say ?capacity? licensing we use that as an umbrella term for both Per TB/PB *or* Per Drive (HDD or SSD). This is contrasted with ?processor? metrics including Socket and the even older PVU licensing. And yes, we IBMers should be more careful about our tendency to use terminology that nobody else in the world does. (Don?t get me started on terabyte versus tebibyte?). So, for the sake of completeness and for anybody reviewing the thread in the future: * Per Drive is available with ESS, Lenovo DSS, and a number of other OEM solutions*. * Per TB/Per PB is available for software defined storage, including some OEM solutions - basically anywhere where figuring out the number of physical drives is infeasible.** * You can if you wish license ESS with Per TB/PB, for example if you want to have a single pool of licensing across an environment that mixes software-defined, ESS, or public cloud; or if you want to include your ESS licenses in an ELA. This is almost always more expensive than Per Drive, but some customers are willing to pay for the privilege of the flexibility. I hope that helps. *(In some cases the customer may not even know it because the OEM solution is sold as a whole with a bottom line price, and the customer does not see a line item price for Scale. In at least one case, the vertical market solution doesn?t even expose the fact that the storage is provided by Scale.) **(Imagine trying to figure out the ?real? number of drives in a high-end storage array that does RAIDing, hides some drives as spares, offers thin provisioning, etc. Or on public cloud where the ?drives? are all virtual.) Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com [signature_1886717044] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 69558 bytes Desc: image001.png URL: From carlz at us.ibm.com Fri Apr 17 13:16:38 2020 From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com) Date: Fri, 17 Apr 2020 12:16:38 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction Message-ID: <48C781AA-BF81-4E8B-A290-C55A0C322DD4@us.ibm.com> Rob Horton wrote: >I'm not sure what the issue is between DDN and IBM (although I've heard various rumors) >but I really wish they would sort something out. Yes, it?s a pain. IBM and DDN are trying very hard to work something out, but it?s hard to get all the ?I?s dotted and ?T?s crossed with our respective legal and exec reviewers so that when we do say something it will be complete, clear, and final; and not require long, baroque threads for people to figure out where exactly they are? I wish I could say more, but I need to respect the confidentiality of the relationship and the live discussion. In the meantime, I thank you for your patience, and ask that you not believe any rumors you might hear, because whatever they are, they are wrong (or at least incomplete). In this situation, as a wise man once observed, ?those who Say don?t Know; those who Know don?t Say?. Regards, Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com [signature_749317756] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 69558 bytes Desc: image001.png URL: From aaron.knister at gmail.com Fri Apr 17 14:15:07 2020 From: aaron.knister at gmail.com (Aaron Knister) Date: Fri, 17 Apr 2020 09:15:07 -0400 Subject: [gpfsug-discuss] Spectrum Scale licensing In-Reply-To: References: Message-ID: Yeah, I had similar experiences in the past (over a decade ago) with Lustre and was heavily heavily anti-Lustre. That said, I just finished several weeks of what I?d call grueling testing of DDN Lustre and GPFS on the same hardware and I?m reasonably convinced much of that is behind us now (things like stability, metadata performance, random I/O performance just don?t appear to be issues anymore and in some cases these operations are now faster in Lustre). Full disclosure, I work for DDN, but the source of my paycheck has relatively little bearing on my technical opinions. All I?m saying is for me to honestly believe Lustre is worth another shot after the experiences I had years ago is significant. I do think it?s key to have a vendor behind you, vs rolling your own. I have seen that make a difference. I?m happy to take any further conversation/questions offline, I?m in no way trying to turn this into a marketing campaign. Sent from my iPhone > On Apr 17, 2020, at 07:02, Jonathan Buzzard wrote: > > ?On 16/04/2020 04:26, Flanders, Dean wrote: >> Hello All, >> As IBM has completely switched to capacity based licensing in order to use SS v5 I was wondering how others are dealing with this? We do not find the capacity based licensing sustainable. Our long term plan is to migrate away from SS v5 to Lustre, and based on the Lustre roadmap we have seen it should have the features we need within the next ~1 year (we are fortunate to have good contacts). > > The problem is the features of Lustre that are missing in GPFS :-) > > For example have they removed the Lustre feature where roughly biannually the metadata server kernel panics introducing incorrectable corruption into the file system that will within six months cause constant crashes of the metadata node to the point where the file system is unusable? > > In best slashdot car analogy GPFS is like driving round in a Aston Martin DB9, where Lustre is like having a Ford Pinto. You will never be happy with Pinto in my experience having gone from the DB9 to the Pinto and back to the DB9. > > That said if you use Lustre as a high performance scratch file system fro HPC and every ~6 months do a shutdown and upgrade, and at the same time reformat your Lustre file system you will be fine. > > Our experience with Lustre was so bad we specifically excluded it as an option for our current HPC system when it went out to tender. > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From carlz at us.ibm.com Fri Apr 17 14:15:07 2020 From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com) Date: Fri, 17 Apr 2020 13:15:07 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction Message-ID: <82819CD0-0BF7-41A6-9896-32AF88744D4B@us.ibm.com> Dean Flanders: > Thanks for the clarification. I have always heard the term "existing customers" so originally I thought we were fine, > but this is the first time I have seen the term "existing systems". However, it seems what I said before is mostly correct, > eventually all customers will be forced to capacity based licensing as they life cycle hardware (even IBM customers). > In addition it seems there is a diminishing number of OEMs that can sell SS v5, which is what happened in our case when > we wanted to go from v4 to v5 with existing hardware (in our case DDN). So I strongly encourage organizations to be thinking > of these issues in their long term planning. Again, this isn?t quite correct, and I really want the archive of this thread to be completely correct when people review it in the future. As an existing customer of DDN, the problem GridScaler customers in particular are facing is not Sockets vs. Capacity. It is simply that DDN is not an OEM licensee for Scale V5. So DDN cannot upgrade your GridScaler to V5, *neither on Sockets nor on Capacity*. Then if you go to another supplier for V5, you are a new customer to that supplier. (Some of you out there are, I know, multi-sourcing your Scale systems, so may be an ?existing customer? of several Scale suppliers). And again, it is not correct that eventually all customers will be forced to capacity licensing. Those of you on Scale Standard and Scale Advanced software, which are not tied to specific systems or hardware, can continue on those licenses. There is no plan to require those people to migrate. By contrast, OEM licenses (and ESS licenses) were always sold as part of a system and attached to that system -- one of the things that makes those licenses cheaper than software licenses that live forever and float from system to system. It is also not true that there is a ?diminishing number of OEMs? selling V5. Everybody that sold V4 has added V5 to their contract, as far as I am aware -- except DDN. And we have added a number of additional OEMs in the past couple of years (some of them quite invisibly as Scale is embedded deep in their solution and they want their own brand front and center) and a couple more big names are in development that I can?t mention until they are ready to announce themselves. We also have a more diverse OEM model: as well as storage vendors that include Scale in a storage solution, we have various embedded vertical solutions, backup solutions, and cloud-based service offerings using Scale. Even Dell is selling a Scale solution now via our OEM Arcastream. Again, DDN and IBM are working together to find a path forward for GridScaler owners to get past this problem, and once again I ask for your patience as we get the details right. Regards Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com [signature_50537] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 69558 bytes Desc: image001.png URL: From steve.hindmarsh at crick.ac.uk Fri Apr 17 14:33:10 2020 From: steve.hindmarsh at crick.ac.uk (Steve Hindmarsh) Date: Fri, 17 Apr 2020 13:33:10 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction In-Reply-To: <48C781AA-BF81-4E8B-A290-C55A0C322DD4@us.ibm.com> References: <48C781AA-BF81-4E8B-A290-C55A0C322DD4@us.ibm.com> Message-ID: Hi Carl, Thanks for the update which is very encouraging. I?m happy to sit tight and wait for an announcement. Best, Steve Steve Hindmarsh Head of Scientific Computing The Francis Crick Institute ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Carl Zetie - carlz at us.ibm.com Sent: Friday, April 17, 2020 1:16:38 PM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Spectrum Scale licensing - important correction Rob Horton wrote: >I'm not sure what the issue is between DDN and IBM (although I've heard various rumors) >but I really wish they would sort something out. Yes, it?s a pain. IBM and DDN are trying very hard to work something out, but it?s hard to get all the ?I?s dotted and ?T?s crossed with our respective legal and exec reviewers so that when we do say something it will be complete, clear, and final; and not require long, baroque threads for people to figure out where exactly they are? I wish I could say more, but I need to respect the confidentiality of the relationship and the live discussion. In the meantime, I thank you for your patience, and ask that you not believe any rumors you might hear, because whatever they are, they are wrong (or at least incomplete). In this situation, as a wise man once observed, ?those who Say don?t Know; those who Know don?t Say?. Regards, Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com [signature_749317756] The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 69558 bytes Desc: image001.png URL: From jonathan.buzzard at strath.ac.uk Fri Apr 17 14:44:29 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 17 Apr 2020 14:44:29 +0100 Subject: [gpfsug-discuss] Spectrum Scale licensing In-Reply-To: References: Message-ID: <52374047-db40-9b99-1f29-a5abdab146f3@strath.ac.uk> On 17/04/2020 14:15, Aaron Knister wrote: > Yeah, I had similar experiences in the past (over a decade ago) with > Lustre and was heavily heavily anti-Lustre. That said, I just > finished several weeks of what I?d call grueling testing of DDN > Lustre and GPFS on the same hardware and I?m reasonably convinced > much of that is behind us now (things like stability, metadata > performance, random I/O performance just don?t appear to be issues > anymore and in some cases these operations are now faster in Lustre). Several weeks testing frankly does not cut the mustard to demonstrate stability. Our Lustre would run for months on end then boom, metadata server kernel panics. Sometimes but not always this would introduce the incorrectable file system corruption. You are going to need to have several years behind it to claim it is now stable. At this point I would note that basically a fsck on Lustre is not possible. Sure there is a somewhat complicated procedure for it, but firstly it is highly likely to take weeks to run, and even then it might not be able to actually fix the problem. > Full disclosure, I work for DDN, but the source of my paycheck has > relatively little bearing on my technical opinions. All I?m saying is > for me to honestly believe Lustre is worth another shot after the > experiences I had years ago is significant. I do think it?s key to > have a vendor behind you, vs rolling your own. I have seen that make > a difference. I?m happy to take any further conversation/questions > offline, I?m in no way trying to turn this into a marketing > campaign. Lustre is as of two years ago still behind GPFS 3.0 in terms of features and stability in my view. The idea it has caught up to GPFS 5.x in the last two years is in my view errant nonsense, software development just does not work like that. Let me put it another way, in our experience the loss of compute capacity from the downtime of Lustre exceeded the cost of GPFS licenses. That excludes the wage costs of researches twiddling their thumbs whilst the system was restored to working order. If I am being cynical if you can afford DDN storage in the first place stop winging about GPFS license costs. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From christophe.darras at atempo.com Fri Apr 17 15:00:10 2020 From: christophe.darras at atempo.com (Christophe Darras) Date: Fri, 17 Apr 2020 14:00:10 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing In-Reply-To: <52374047-db40-9b99-1f29-a5abdab146f3@strath.ac.uk> References: <52374047-db40-9b99-1f29-a5abdab146f3@strath.ac.uk> Message-ID: Hey Ladies and Gent, For some people here, it seems GPFS is like a religion? A lovely weekend to all of you, Kind Regards, Chris -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jonathan Buzzard Sent: vendredi 17 avril 2020 14:44 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Spectrum Scale licensing On 17/04/2020 14:15, Aaron Knister wrote: > Yeah, I had similar experiences in the past (over a decade ago) with > Lustre and was heavily heavily anti-Lustre. That said, I just finished > several weeks of what I?d call grueling testing of DDN Lustre and GPFS > on the same hardware and I?m reasonably convinced much of that is > behind us now (things like stability, metadata performance, random I/O > performance just don?t appear to be issues anymore and in some cases > these operations are now faster in Lustre). Several weeks testing frankly does not cut the mustard to demonstrate stability. Our Lustre would run for months on end then boom, metadata server kernel panics. Sometimes but not always this would introduce the incorrectable file system corruption. You are going to need to have several years behind it to claim it is now stable. At this point I would note that basically a fsck on Lustre is not possible. Sure there is a somewhat complicated procedure for it, but firstly it is highly likely to take weeks to run, and even then it might not be able to actually fix the problem. > Full disclosure, I work for DDN, but the source of my paycheck has > relatively little bearing on my technical opinions. All I?m saying is > for me to honestly believe Lustre is worth another shot after the > experiences I had years ago is significant. I do think it?s key to > have a vendor behind you, vs rolling your own. I have seen that make a > difference. I?m happy to take any further conversation/questions > offline, I?m in no way trying to turn this into a marketing campaign. Lustre is as of two years ago still behind GPFS 3.0 in terms of features and stability in my view. The idea it has caught up to GPFS 5.x in the last two years is in my view errant nonsense, software development just does not work like that. Let me put it another way, in our experience the loss of compute capacity from the downtime of Lustre exceeded the cost of GPFS licenses. That excludes the wage costs of researches twiddling their thumbs whilst the system was restored to working order. If I am being cynical if you can afford DDN storage in the first place stop winging about GPFS license costs. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From yeep at robust.my Fri Apr 17 15:01:05 2020 From: yeep at robust.my (T.A. Yeep) Date: Fri, 17 Apr 2020 22:01:05 +0800 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction In-Reply-To: References: Message-ID: Hi JAB, Sound interesting, however, I'm actually a newcomer to Scale, I wish I could share the joy of mixing that. I guess maybe it is something similar to LSF RVU/UVUs? Thanks for sharing your experience anyway. Hi Carl, I just want to let you know that I have got your explanation, and I understand it now. Thanks. Not sure If I should always reply a "thank you" or "I've got it" in the mailing list, or better just do it privately. Same I'm new to mailing list too, so please let me know if I should not reply it publicly. On Fri, Apr 17, 2020 at 6:50 PM Jonathan Buzzard < jonathan.buzzard at strath.ac.uk> wrote: > On 17/04/2020 11:31, T.A. Yeep wrote: > > Hi Carl, > > > > I'm confused here, in the previous email it was said *And for ESS, it is > > licensed Per Drive with different prices for HDDs and SSDs.* > > > > But then you mentioned in below email that: > > But new customers and new OEM systems are *all licensed by Capacity. > > This also applies to IBM's own ESS*: you can keep upgrading your old (if > > hardware is supported) gen 1 ESS on Sockets, but if you replace it with > > *a new ESS, that will come with capacity licenses*. > > > > Now the question, ESS is license per Drive or by capacity? > > > > Well by drive is "capacity" based licensing unless you have some sort of > magical infinite capacity drives :-) > > Under the PVU scheme if you know what you are doing you could game the > system. For example get a handful of servers get PVU licenses for them > create a GPFS file system handing off the back using say Fibre Channel > and cheap FC attached arrays (Dell MD3000 series springs to mind) and > then hang many PB off the back. I could using this scheme create a 100PB > filesystem for under a thousand PVU of GPFS server licenses. Add in > another cluster for protocol nodes and if you are not mounting on HPC > nodes that's a winner :-) > > In a similar manner I use a pimped out ancient Dell R300 with dual core > Xeon for backing up my GPFS filesystem because it's 100PVU of TSM > licensing and I am cheap, and besides it is more than enough grunt for > the job. A new machine would be 240 PVU minimum (4*70). I plan on > replacing the PERC SAS6 card with a H710 and new internal cabling to run > RHEL8 :-) > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Best regards *T.A. Yeep*Mobile: 016-719 8506 | Tel/Fax: 03-6261 7237 | www.robusthpc.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ulmer at ulmer.org Sat Apr 18 16:04:53 2020 From: ulmer at ulmer.org (Stephen Ulmer) Date: Sat, 18 Apr 2020 11:04:53 -0400 Subject: [gpfsug-discuss] gpfs filesets question In-Reply-To: References: Message-ID: Is this still true if the source and target fileset are both in the same storage pool? It seems like they could just move the metadata? Especially in the case of dependent filesets where the metadata is actually in the same allocation area for both the source and target. Maybe this just doesn?t happen often enough to optimize? -- Stephen > On Apr 16, 2020, at 12:50 PM, Oesterlin, Robert wrote: > > Moving data between filesets is like moving files between file systems. Normally when you move files between directories, it?s simple metadata, but with filesets (dependent or independent) is a full copy and delete of the old data. > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > > From: > on behalf of "J. Eric Wonderley" > > Reply-To: gpfsug main discussion list > > Date: Thursday, April 16, 2020 at 11:32 AM > To: gpfsug main discussion list > > Subject: [EXTERNAL] [gpfsug-discuss] gpfs filesets question > > I have filesets setup in a filesystem...looks like: > [root at cl005 ~]# mmlsfileset home -L > Filesets in file system 'home': > Name Id RootInode ParentId Created InodeSpace MaxInodes AllocInodes Comment > root 0 3 -- Tue Jun 30 07:54:09 2015 0 402653184 320946176 root fileset > hess 1 543733376 0 Tue Jun 13 14:56:13 2017 0 0 0 > predictHPC 2 1171116 0 Thu Jan 5 15:16:56 2017 0 0 0 > HYCCSIM 3 544258049 0 Wed Jun 14 10:00:41 2017 0 0 0 > socialdet 4 544258050 0 Wed Jun 14 10:01:02 2017 0 0 0 > arc 5 1171073 0 Thu Jan 5 15:07:09 2017 0 0 0 > arcadm 6 1171074 0 Thu Jan 5 15:07:10 2017 0 0 0 > > I beleive these are dependent filesets. Dependent on the root fileset. Anyhow a user wants to move a large amount of data from one fileset to another. Would this be a metadata only operation? He has attempted to small amount of data and has noticed some thrasing. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From st.graf at fz-juelich.de Mon Apr 20 09:29:17 2020 From: st.graf at fz-juelich.de (Stephan Graf) Date: Mon, 20 Apr 2020 10:29:17 +0200 Subject: [gpfsug-discuss] gpfs filesets question In-Reply-To: References: Message-ID: Hi, we recognized this behavior when we tried to move HSM migrated files between filesets. This cases a recall. Very annoying when the data are afterword stored on the same pools and have to be migrated back to tape. @IBM: should we open a RFE to address this? Stephan Am 18.04.2020 um 17:04 schrieb Stephen Ulmer: > Is this still true if the source and target fileset are both in the same > storage pool? It seems like they could just move the metadata? > Especially in the case of dependent filesets where the metadata is > actually in the same allocation area for both the source and target. > > Maybe this just doesn?t happen often enough to optimize? > > -- > Stephen > > > >> On Apr 16, 2020, at 12:50 PM, Oesterlin, Robert >> > wrote: >> >> Moving data between filesets is like moving files between file >> systems. Normally when you move files between directories, it?s simple >> metadata, but with filesets (dependent or independent) is a full copy >> and delete of the old data. >> Bob Oesterlin >> Sr Principal Storage Engineer, Nuance >> *From:*> > on behalf of "J. >> Eric Wonderley" > >> *Reply-To:*gpfsug main discussion list >> > > >> *Date:*Thursday, April 16, 2020 at 11:32 AM >> *To:*gpfsug main discussion list > > >> *Subject:*[EXTERNAL] [gpfsug-discuss] gpfs filesets question >> I have filesets setup in a filesystem...looks like: >> [root at cl005 ~]# mmlsfileset home -L >> Filesets in file system 'home': >> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ?Id ? ? ?RootInode ?ParentId Created >> ? ? ? ? ? ? ? ? ? ?InodeSpace ? ? ?MaxInodes ? ?AllocInodes Comment >> root ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? ? ? ? ? ?3 ? ? ? ?-- Tue Jun 30 >> 07:54:09 2015 ? ? ? ?0 ? ? ? ? ? ?402653184 ? ? ?320946176 root fileset >> hess ? ? ? ? ? ? ? ? ? ? ? ? ? ? 1 ? ? ?543733376 ? ? ? ? 0 Tue Jun 13 >> 14:56:13 2017 ? ? ? ?0 ? ? ? ? ? ? ? ? ? ?0 ? ? ? ? ? ? ?0 >> predictHPC ? ? ? ? ? ? ? ? ? ? ? 2 ? ? ? ?1171116 ? ? ? ? 0 Thu Jan ?5 >> 15:16:56 2017 ? ? ? ?0 ? ? ? ? ? ? ? ? ? ?0 ? ? ? ? ? ? ?0 >> HYCCSIM ? ? ? ? ? ? ? ? ? ? ? ? ?3 ? ? ?544258049 ? ? ? ? 0 Wed Jun 14 >> 10:00:41 2017 ? ? ? ?0 ? ? ? ? ? ? ? ? ? ?0 ? ? ? ? ? ? ?0 >> socialdet ? ? ? ? ? ? ? ? ? ? ? ?4 ? ? ?544258050 ? ? ? ? 0 Wed Jun 14 >> 10:01:02 2017 ? ? ? ?0 ? ? ? ? ? ? ? ? ? ?0 ? ? ? ? ? ? ?0 >> arc ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?5 ? ? ? ?1171073 ? ? ? ? 0 Thu Jan ?5 >> 15:07:09 2017 ? ? ? ?0 ? ? ? ? ? ? ? ? ? ?0 ? ? ? ? ? ? ?0 >> arcadm ? ? ? ? ? ? ? ? ? ? ? ? ? 6 ? ? ? ?1171074 ? ? ? ? 0 Thu Jan ?5 >> 15:07:10 2017 ? ? ? ?0 ? ? ? ? ? ? ? ? ? ?0 ? ? ? ? ? ? ?0 >> I beleive these are dependent filesets.? Dependent on the root >> fileset.? ?Anyhow a user wants to move a large amount of data from one >> fileset to another.? ?Would this be a metadata only operation?? He has >> attempted to small amount of data and has noticed some thrasing. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss atspectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5360 bytes Desc: S/MIME Cryptographic Signature URL: From olaf.weiser at de.ibm.com Mon Apr 20 11:54:06 2020 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Mon, 20 Apr 2020 10:54:06 +0000 Subject: [gpfsug-discuss] gpfs filesets question In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: From skariapaul at gmail.com Wed Apr 22 04:40:28 2020 From: skariapaul at gmail.com (PS K) Date: Wed, 22 Apr 2020 11:40:28 +0800 Subject: [gpfsug-discuss] S3, S3A & S3n support Message-ID: Hi, Does SS object protocol support S3a and S3n? Regards Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Wed Apr 22 09:19:10 2020 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Wed, 22 Apr 2020 04:19:10 -0400 Subject: [gpfsug-discuss] GPFS vulnerability with possible root exploit on versions prior to 5.0.4.3 (and 4.2.3.21) Message-ID: <4b396469-b1c7-69bc-157c-23bbe62d847a@scinet.utoronto.ca> In case you missed (the forum has been pretty quiet about this one), CVE-2020-4273 had an update yesterday: https://www.ibm.com/support/pages/node/6151701?myns=s033&mynp=OCSTXKQY&mync=E&cm_sp=s033-_-OCSTXKQY-_-E If you can't do the upgrade now, at least apply the mitigation to the client nodes generally exposed to unprivileged users: Check the setuid bit: ls -l /usr/lpp/mmfs/bin | grep r-s | awk '{system("ls -l /usr/lpp/mmfs/bin/"$9)}') Apply the mitigation: ls -l /usr/lpp/mmfs/bin | grep r-s | awk '{system("chmod u-s /usr/lpp/mmfs/bin/"$9)}' Verification: ls -l /usr/lpp/mmfs/bin | grep r-s | awk '{system("ls -l /usr/lpp/mmfs/bin/"$9)}') All the best Jaime . . . ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 From st.graf at fz-juelich.de Wed Apr 22 10:02:59 2020 From: st.graf at fz-juelich.de (Stephan Graf) Date: Wed, 22 Apr 2020 11:02:59 +0200 Subject: [gpfsug-discuss] GPFS vulnerability with possible root exploit on versions prior to 5.0.4.3 (and 4.2.3.21) In-Reply-To: <4b396469-b1c7-69bc-157c-23bbe62d847a@scinet.utoronto.ca> References: <4b396469-b1c7-69bc-157c-23bbe62d847a@scinet.utoronto.ca> Message-ID: Hi I took a lookat the "Readme and Release notes for release 5.0.4.3 IBM Spectrum Scale 5.0.4.3 Spectrum_Scale_Data_Management-5.0.4.3-x86_64-Linux Readme" But I did not find the entry which mentioned the "For IBM Spectrum Scale V5.0.0.0 through V5.0.4.1, reference APAR IJ23438" APAR number which is mentioned on the "Security Bulletin: A vulnerability has been identified in IBM Spectrum Scale where an unprivileged user could execute commands as root ( CVE-2020-4273)" page. shouldn't it be mentioned there? Stephan Am 22.04.2020 um 10:19 schrieb Jaime Pinto: > In case you missed (the forum has been pretty quiet about this one), > CVE-2020-4273 had an update yesterday: > > https://www.ibm.com/support/pages/node/6151701?myns=s033&mynp=OCSTXKQY&mync=E&cm_sp=s033-_-OCSTXKQY-_-E > > > If you can't do the upgrade now, at least apply the mitigation to the > client nodes generally exposed to unprivileged users: > > Check the setuid bit: > ls -l /usr/lpp/mmfs/bin | grep r-s | awk '{system("ls -l > /usr/lpp/mmfs/bin/"$9)}') > > Apply the mitigation: > ls -l /usr/lpp/mmfs/bin | grep r-s | awk '{system("chmod u-s > /usr/lpp/mmfs/bin/"$9)}' > > Verification: > ls -l /usr/lpp/mmfs/bin | grep r-s | awk '{system("ls -l > /usr/lpp/mmfs/bin/"$9)}') > > All the best > Jaime > > . > . > .??????? ************************************ > ????????? TELL US ABOUT YOUR SUCCESS STORIES > ???????? http://www.scinethpc.ca/testimonials > ???????? ************************************ > --- > Jaime Pinto - Storage Analyst > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5360 bytes Desc: S/MIME Cryptographic Signature URL: From knop at us.ibm.com Wed Apr 22 16:42:54 2020 From: knop at us.ibm.com (Felipe Knop) Date: Wed, 22 Apr 2020 15:42:54 +0000 Subject: [gpfsug-discuss] GPFS vulnerability with possible root exploit on versions prior to 5.0.4.3 (and 4.2.3.21) In-Reply-To: References: , <4b396469-b1c7-69bc-157c-23bbe62d847a@scinet.utoronto.ca> Message-ID: An HTML attachment was scrubbed... URL: From thakur.hpc at gmail.com Wed Apr 22 19:23:53 2020 From: thakur.hpc at gmail.com (Bhupender thakur) Date: Wed, 22 Apr 2020 11:23:53 -0700 Subject: [gpfsug-discuss] GPFS vulnerability with possible root exploit on versions prior to 5.0.4.3 (and 4.2.3.21) In-Reply-To: References: <4b396469-b1c7-69bc-157c-23bbe62d847a@scinet.utoronto.ca> Message-ID: Has IBM released or does IBM plan to release a fix in the 5.0.3.x branch? On Wed, Apr 22, 2020 at 8:45 AM Felipe Knop wrote: > Stephan, > > Security bulletins need to go through an internal process, including legal > review. In addition, we are normally required to ensure the fix is > available for all releases before the security bulletin can be published. > Because of that, we normally don't list details for security fixes in > either the readmes or APARs, since the information can only be disclosed in > the bulletin itself. > > ---- > The bulletin below has: > > If you cannot apply the latest level of service, contact IBM Service for > an efix: > > - For IBM Spectrum Scale V5.0.0.0 through V5.0.4.1, reference APAR IJ23438 > > - For IBM Spectrum Scale V4.2.0.0 through V4.2.3.20, reference APAR > IJ23426 > "V5.0.0.0 through V5.0.4.1" should have been "V5.0.0.0 through V5.0.4.2". > (I have asked the text to be corrected) > > > > Felipe > > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > > ----- Original message ----- > From: Stephan Graf > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: > Cc: > Subject: [EXTERNAL] Re: [gpfsug-discuss] GPFS vulnerability with possible > root exploit on versions prior to 5.0.4.3 (and 4.2.3.21) > Date: Wed, Apr 22, 2020 5:04 AM > > Hi > > I took a lookat the "Readme and Release notes for release 5.0.4.3 IBM > Spectrum Scale 5.0.4.3 > Spectrum_Scale_Data_Management-5.0.4.3-x86_64-Linux Readme" > But I did not find the entry which mentioned the "For IBM Spectrum Scale > V5.0.0.0 through V5.0.4.1, reference APAR IJ23438" APAR number which is > mentioned on the "Security Bulletin: A vulnerability has been identified > in IBM Spectrum Scale where an unprivileged user could execute commands > as root ( CVE-2020-4273)" page. > > shouldn't it be mentioned there? > > Stephan > > > Am 22.04.2020 um 10:19 schrieb Jaime Pinto: > > In case you missed (the forum has been pretty quiet about this one), > > CVE-2020-4273 had an update yesterday: > > > > > https://www.ibm.com/support/pages/node/6151701?myns=s033&mynp=OCSTXKQY&mync=E&cm_sp=s033-_-OCSTXKQY-_-E > > > > > > > If you can't do the upgrade now, at least apply the mitigation to the > > client nodes generally exposed to unprivileged users: > > > > Check the setuid bit: > > ls -l /usr/lpp/mmfs/bin | grep r-s | awk '{system("ls -l > > /usr/lpp/mmfs/bin/"$9)}') > > > > Apply the mitigation: > > ls -l /usr/lpp/mmfs/bin | grep r-s | awk '{system("chmod u-s > > /usr/lpp/mmfs/bin/"$9)}' > > > > Verification: > > ls -l /usr/lpp/mmfs/bin | grep r-s | awk '{system("ls -l > > /usr/lpp/mmfs/bin/"$9)}') > > > > All the best > > Jaime > > > > . > > . > > . ************************************ > > TELL US ABOUT YOUR SUCCESS STORIES > > http://www.scinethpc.ca/testimonials > > ************************************ > > --- > > Jaime Pinto - Storage Analyst > > SciNet HPC Consortium - Compute/Calcul Canada > > www.scinet.utoronto.ca - www.computecanada.ca > > University of Toronto > > 661 University Ave. (MaRS), Suite 1140 > > Toronto, ON, M5G1M1 > > P: 416-978-2755 > > C: 416-505-1477 > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Wed Apr 22 21:05:49 2020 From: knop at us.ibm.com (Felipe Knop) Date: Wed, 22 Apr 2020 20:05:49 +0000 Subject: [gpfsug-discuss] GPFS vulnerability with possible root exploit on versions prior to 5.0.4.3 (and 4.2.3.21) In-Reply-To: References: , <4b396469-b1c7-69bc-157c-23bbe62d847a@scinet.utoronto.ca> Message-ID: An HTML attachment was scrubbed... URL: From thakur.hpc at gmail.com Wed Apr 22 21:47:30 2020 From: thakur.hpc at gmail.com (Bhupender thakur) Date: Wed, 22 Apr 2020 13:47:30 -0700 Subject: [gpfsug-discuss] GPFS vulnerability with possible root exploit on versions prior to 5.0.4.3 (and 4.2.3.21) In-Reply-To: References: <4b396469-b1c7-69bc-157c-23bbe62d847a@scinet.utoronto.ca> Message-ID: Thanks for the clarification Felipe. On Wed, Apr 22, 2020 at 1:06 PM Felipe Knop wrote: > Bhupender, > > PTFs for the 5.0.3 branch are no longer produced (as is the case for > 5.0.2, 5.0.1, and 5.0.0), but efixes for 5.0.3 can be requested. When > requesting the efix, please indicate the APAR number listed in bulletin > below, as well as the location of the bulletin itself, just in case: > > > https://www.ibm.com/support/pages/node/6151701?myns=s033&mynp=OCSTXKQY&mync=E&cm_sp=s033-_-OCSTXKQY-_-E > > Felipe > > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > > ----- Original message ----- > From: Bhupender thakur > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: [EXTERNAL] Re: [gpfsug-discuss] GPFS vulnerability with possible > root exploit on versions prior to 5.0.4.3 (and 4.2.3.21) > Date: Wed, Apr 22, 2020 2:24 PM > > Has IBM released or does IBM plan to release a fix in the 5.0.3.x branch? > > On Wed, Apr 22, 2020 at 8:45 AM Felipe Knop wrote: > > Stephan, > > Security bulletins need to go through an internal process, including legal > review. In addition, we are normally required to ensure the fix is > available for all releases before the security bulletin can be published. > Because of that, we normally don't list details for security fixes in > either the readmes or APARs, since the information can only be disclosed in > the bulletin itself. > > ---- > The bulletin below has: > > If you cannot apply the latest level of service, contact IBM Service for > an efix: > > - For IBM Spectrum Scale V5.0.0.0 through V5.0.4.1, reference APAR IJ23438 > > - For IBM Spectrum Scale V4.2.0.0 through V4.2.3.20, reference APAR > IJ23426 > "V5.0.0.0 through V5.0.4.1" should have been "V5.0.0.0 through V5.0.4.2". > (I have asked the text to be corrected) > > > > Felipe > > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > > ----- Original message ----- > From: Stephan Graf > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: > Cc: > Subject: [EXTERNAL] Re: [gpfsug-discuss] GPFS vulnerability with possible > root exploit on versions prior to 5.0.4.3 (and 4.2.3.21) > Date: Wed, Apr 22, 2020 5:04 AM > > Hi > > I took a lookat the "Readme and Release notes for release 5.0.4.3 IBM > Spectrum Scale 5.0.4.3 > Spectrum_Scale_Data_Management-5.0.4.3-x86_64-Linux Readme" > But I did not find the entry which mentioned the "For IBM Spectrum Scale > V5.0.0.0 through V5.0.4.1, reference APAR IJ23438" APAR number which is > mentioned on the "Security Bulletin: A vulnerability has been identified > in IBM Spectrum Scale where an unprivileged user could execute commands > as root ( CVE-2020-4273)" page. > > shouldn't it be mentioned there? > > Stephan > > > Am 22.04.2020 um 10:19 schrieb Jaime Pinto: > > In case you missed (the forum has been pretty quiet about this one), > > CVE-2020-4273 had an update yesterday: > > > > > https://www.ibm.com/support/pages/node/6151701?myns=s033&mynp=OCSTXKQY&mync=E&cm_sp=s033-_-OCSTXKQY-_-E > > > > > > > If you can't do the upgrade now, at least apply the mitigation to the > > client nodes generally exposed to unprivileged users: > > > > Check the setuid bit: > > ls -l /usr/lpp/mmfs/bin | grep r-s | awk '{system("ls -l > > /usr/lpp/mmfs/bin/"$9)}') > > > > Apply the mitigation: > > ls -l /usr/lpp/mmfs/bin | grep r-s | awk '{system("chmod u-s > > /usr/lpp/mmfs/bin/"$9)}' > > > > Verification: > > ls -l /usr/lpp/mmfs/bin | grep r-s | awk '{system("ls -l > > /usr/lpp/mmfs/bin/"$9)}') > > > > All the best > > Jaime > > > > . > > . > > . ************************************ > > TELL US ABOUT YOUR SUCCESS STORIES > > http://www.scinethpc.ca/testimonials > > ************************************ > > --- > > Jaime Pinto - Storage Analyst > > SciNet HPC Consortium - Compute/Calcul Canada > > www.scinet.utoronto.ca - www.computecanada.ca > > University of Toronto > > 661 University Ave. (MaRS), Suite 1140 > > Toronto, ON, M5G1M1 > > P: 416-978-2755 > > C: 416-505-1477 > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Wed Apr 22 23:34:33 2020 From: bbanister at jumptrading.com (Bryan Banister) Date: Wed, 22 Apr 2020 22:34:33 +0000 Subject: [gpfsug-discuss] Is there a difference in suspend and empty NSD state? Message-ID: Hello all, Looking at the man page, it is fairly ambiguous as to these NSD states actually being different (and if not WHY have to names for the same thing?!): suspend or empty Instructs GPFS to stop allocating space on the specified disk. Put a disk in this state when you are preparing to remove the file system data from the disk or if you want to prevent new data from being put on the disk. This is a user-initiated state that GPFS never enters without an explicit command to change the disk state. Existing data on a suspended disk may still be read or updated. A disk remains in a suspended or to be emptied state until it is explicitly resumed. Restarting GPFS or rebooting nodes does not restore normal access to a suspended disk. And from the examples lower in the page: Note: In product versions earlier than V4.1.1, the mmlsdisk command lists the disk status as suspended. In product versions V4.1.1 and later, the mmlsdisk command lists the disk status as to be emptied with both mmchdisk suspend or mmchdisk empty commands. And really what I currently want to do is suspend a set of disks, and then mark a different set of disks as "to be emptied". Then I will run a mmrestripefs operation to move the data off of the "to be emptied" disks, but not onto the suspended disks (which will also be removed from the file system in the near future). Once the NSDs are emptied then it will be a very (relatively) fast mmdeldisk operation. So is that possible? As you can likely tell, I don't have enough space to just delete both sets of disks at once during a (yay!) full file system migration to the new GPFS 5.x version. Thought this might be useful to others, so posted here. Thanks in advance neighbors! -Bryan -------------- next part -------------- An HTML attachment was scrubbed... URL: From brnelson at us.ibm.com Thu Apr 23 00:49:13 2020 From: brnelson at us.ibm.com (Brian Nelson) Date: Wed, 22 Apr 2020 18:49:13 -0500 Subject: [gpfsug-discuss] S3, S3A & S3n support In-Reply-To: References: Message-ID: The Spectrum Scale Object protocol only has support for the traditional S3 object storage. -Brian =================================== Brian Nelson IBM Spectrum Scale brnelson at us.ibm.com ----- Original message ----- From: PS K Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Cc: Subject: [EXTERNAL] [gpfsug-discuss] S3, S3A & S3n support Date: Wed, Apr 22, 2020 12:03 AM Hi, Does SS object protocol support S3a and S3n? Regards Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Thu Apr 23 11:33:34 2020 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Thu, 23 Apr 2020 18:33:34 +0800 Subject: [gpfsug-discuss] =?utf-8?q?Is_there_a_difference_in_suspend_and_e?= =?utf-8?q?mpty_NSD=09state=3F?= In-Reply-To: References: Message-ID: Option 'suspend' is same to 'empty' if the cluster is updated to Scale 4.1.1. The option 'empty' was introduced in 4.1.1 to support disk deletion in a fast way, 'suspend' option was not removed with due consideration for previous users. > And really what I currently want to do is suspend a set of disks, > and then mark a different set of disks as ?to be emptied?. Then I > will run a mmrestripefs operation to move the data off of the ?to be > emptied? disks, but not onto the suspended disks (which will also be > removed from the file system in the near future). Once the NSDs are > emptied then it will be a very (relatively) fast mmdeldisk > operation. So is that possible? It's possible only if these two sets of disks belong to two different pools . If they are in the same pool, restripefs on the pool will migrate all data off these two sets of disks. If they are in two different pools, you can use mmrestripefs with -P option to migrate data off "suspended" and "to be emptied" disks in the specified data pool. Please note that system pool is special, mmrestripefs will unconditionally restripe the system pool even you specified -P option to a data pool. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. gpfsug-discuss-bounces at spectrumscale.org wrote on 2020/04/23 06:34:33: > From: Bryan Banister > To: gpfsug main discussion list > Date: 2020/04/23 06:35 > Subject: [EXTERNAL] [gpfsug-discuss] Is there a difference in > suspend and empty NSD state? > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > Hello all, > > Looking at the man page, it is fairly ambiguous as to these NSD > states actually being different (and if not WHY have to names for > the same thing?!): > > suspend > or > empty > Instructs GPFS to stop allocating space on the specified > disk. Put a disk in this state when you are preparing to > remove the file system data from the disk or if you want > to prevent new data from being put on the disk. This is > a user-initiated state that GPFS never enters without an > explicit command to change the disk state. Existing data > on a suspended disk may still be read or updated. > > A disk remains in a suspended or to be > emptied state until it is explicitly resumed. > Restarting GPFS or rebooting nodes does not restore > normal access to a suspended disk. > > And from the examples lower in the page: > Note: In product versions earlier than V4.1.1, the > mmlsdisk command lists the disk status as > suspended. In product versions V4.1.1 and later, the > mmlsdisk command lists the disk status as to be > emptied with both mmchdisk suspend or mmchdisk > empty commands. > > > And really what I currently want to do is suspend a set of disks, > and then mark a different set of disks as ?to be emptied?. Then I > will run a mmrestripefs operation to move the data off of the ?to be > emptied? disks, but not onto the suspended disks (which will also be > removed from the file system in the near future). Once the NSDs are > emptied then it will be a very (relatively) fast mmdeldisk > operation. So is that possible? > > As you can likely tell, I don?t have enough space to just delete > both sets of disks at once during a (yay!) full file system > migration to the new GPFS 5.x version. > > Thought this might be useful to others, so posted here. Thanks in > advance neighbors! > -Bryan_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url? > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=QxEYrybXOI6xpUEVxZumWQYDMDbDLx4O4vrm0PNotMw&s=4M2- > uNMOrvL7kEQu_UmL5VvnkKfPL-EpSapVGkSX1jc&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From carlz at us.ibm.com Thu Apr 23 13:55:43 2020 From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com) Date: Thu, 23 Apr 2020 12:55:43 +0000 Subject: [gpfsug-discuss] S3, S3A & S3n support Message-ID: <4BD10FED-F735-4E9D-A04A-7D5C1AD7C598@us.ibm.com> From PS K: >Does SS object protocol support S3a and S3n? Can you share some more details of your requirements, use case, etc., either here on the list or privately with me? We?re currently looking at the strategic direction of our S3 support. As Brian said, today it?s strictly the ?traditional? S3 protocol, but we are evaluating where to go next. Thanks, Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com [signature_219535040] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 69558 bytes Desc: image001.png URL: From skariapaul at gmail.com Fri Apr 24 09:24:53 2020 From: skariapaul at gmail.com (PS K) Date: Fri, 24 Apr 2020 16:24:53 +0800 Subject: [gpfsug-discuss] S3, S3A & S3n support In-Reply-To: <4BD10FED-F735-4E9D-A04A-7D5C1AD7C598@us.ibm.com> References: <4BD10FED-F735-4E9D-A04A-7D5C1AD7C598@us.ibm.com> Message-ID: This is for spark integration which supports only s3a. Cheers On Thu, Apr 23, 2020 at 8:55 PM Carl Zetie - carlz at us.ibm.com < carlz at us.ibm.com> wrote: > From PS K: > > >Does SS object protocol support S3a and S3n? > > > > Can you share some more details of your requirements, use case, etc., > either here on the list or privately with me? > > > > We?re currently looking at the strategic direction of our S3 support. As > Brian said, today it?s strictly the ?traditional? S3 protocol, but we are > evaluating where to go next. > > > > Thanks, > > > > Carl Zetie > > Program Director > > Offering Management > > Spectrum Scale > > ---- > > (919) 473 3318 ][ Research Triangle Park > > carlz at us.ibm.com > > [image: signature_219535040] > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 69558 bytes Desc: not available URL: From TROPPENS at de.ibm.com Mon Apr 27 10:28:59 2020 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Mon, 27 Apr 2020 09:28:59 +0000 Subject: [gpfsug-discuss] Chart decks of German User Meeting are now available Message-ID: An HTML attachment was scrubbed... URL: From andi at christiansen.xxx Tue Apr 28 07:34:37 2020 From: andi at christiansen.xxx (Andi Christiansen) Date: Tue, 28 Apr 2020 08:34:37 +0200 (CEST) Subject: [gpfsug-discuss] Tuning Spectrum Scale AFM for stability? Message-ID: <239358449.52194.1588055677577@privateemail.com> Hi All, Can anyone share some thoughts on how to tune AFM for stability? at the moment we have ok performance between our sites (5-8Gbits with 34ms latency) but we encounter a lock down of the cache fileset from week to week, which was day to day before we tuned below settings.. is there any way to tune AFM further i haven't found ? Cache Site only: TCP Settings: sunrpc.tcp_slot_table_entries = 128 Home and Cache: AFM / GPFS Settings: maxBufferDescs=163840 afmHardMemThreshold=25G afmMaxWriteMergeLen=30G Cache fileset: Attributes for fileset AFMFILESET: ================================ Status Linked Path /mnt/fs02/AFMFILESET Id 1 Root inode 524291 Parent Id 0 Created Tue Apr 14 15:57:43 2020 Comment Inode space 1 Maximum number of inodes 10000384 Allocated inodes 10000384 Permission change flag chmodAndSetacl afm-associated Yes Target nfs://DK_VPN/mnt/fs01/AFMFILESET Mode single-writer File Lookup Refresh Interval 30 (default) File Open Refresh Interval 30 (default) Dir Lookup Refresh Interval 60 (default) Dir Open Refresh Interval 60 (default) Async Delay 15 (default) Last pSnapId 0 Display Home Snapshots no Number of Read Threads per Gateway 64 Parallel Read Chunk Size 128 Parallel Read Threshold 1024 Number of Gateway Flush Threads 48 Prefetch Threshold 0 (default) Eviction Enabled yes (default) Parallel Write Threshold 1024 Parallel Write Chunk Size 128 Number of Write Threads per Gateway 16 IO Flags 0 (default) mmfsadm dump afm: AFM Gateway: RpcQLen: 0 maxPoolSize: 4294967295 QOF: 0 MaxOF: 131072 readThLimit 128 minIOBuf 1048576 maxIOBuf 1073741824 msgMaxWriteSize 2147483648 readBypassThresh 67108864 QLen: 0 QMem: 0 SoftQMem: 10737418240 HardQMem 26843545600 Ping thread: Started Fileset: AFMFILESET 1 (fs02) mode: single-writer queue: Normal MDS: QMem 0 CTL 577 home: DK_VPN homeServer: 10.110.5.11 proto: nfs port: 2049 lastCmd: 16 handler: Mounted Dirty refCount: 1 queueTransfer: state: Idle senderVerified: 0 receiverVerified: 1 terminate: 0 psnapWait: 0 remoteAttrs: AsyncLookups 0 tsfindinode: success 0 failed 0 totalTime 0.0 avgTime 0,000000 maxTime 0.0 queue: delay 15 QLen 0+0 flushThds 0 maxFlushThds 48 numExec 8772518 qfs 0 iwo 0 err 78 handlerCreateTime : 2020-04-27_11:14:57.415+0200 numCreateSnaps : 0 InflightAsyncLookups 0 lastReplayTime : 2020-04-28_07:22:32.415+0200 lastSyncTime : 2020-04-27_15:09:57.415+0200 i/o: readBuf: 33554432 writeBuf: 2097152 sparseReadThresh: 134217728 pReadThreads 64 i/o: pReadChunkSize 33554432 pReadThresh: 1073741824 pWriteThresh: 1073741824 i/o: prefetchThresh 0 (Prefetch) Mnt status: 0:0 1:0 2:0 3:0 Export Map: 10.110.5.10/ 10.110.5.11/ 10.110.5.12/ 10.110.5.13/ Priority Queue: Empty (state: Active) Normal Queue: Empty (state: Active) Cluster Config Cache: maxFilesToCache 131072 maxStatCache 524288 afmDIO 2 afmIOFlags 4096 maxReceiverThreads 32 afmNumReadThreads 64 afmNumWriteThreads 8 afmHardMemThreshold 26843545600 maxBufferDescs 163840 afmMaxWriteMergeLen 32212254720 workerThreads 1024 The entries in the gpfs log states "AFM: Home is taking longer to respond..." but its only AFM and the Cache AFM fileset which enteres a locked state. we have the same NFS exports from home mounted on the same gateway nodes to check when a file is transferred and they are all ok while the AFM lock is happening. a simple gpfs restart of the AFM Master node is enough to make AFM restart and continue for another week.. The home target is exported through CES NFS from 4 CES nodes and a map is created at the Cache site to utilize the ParallelWrites feature. If there is anyone sitting around with some ideas/knowledge on how to tune this further for more stability then i would be happy if you could share your thoughts about it! :-) Many Thanks in Advance! Andi Christiansen -------------- next part -------------- An HTML attachment was scrubbed... URL: From u.sibiller at science-computing.de Tue Apr 28 11:57:48 2020 From: u.sibiller at science-computing.de (Ulrich Sibiller) Date: Tue, 28 Apr 2020 12:57:48 +0200 Subject: [gpfsug-discuss] wait for mount during gpfs startup Message-ID: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de> Hi, when the gpfs systemd service returns from startup the filesystems are usually not mounted. So having another service depending on gpfs is not feasible if you require the filesystem(s). Therefore we have added a script to the systemd gpfs service that waits for all local gpfs filesystems being mounted. We have added that script via ExecStartPost: ------------------------------------------------------------ # cat /etc/systemd/system/gpfs.service.d/waitmount.conf [Service] ExecStartPost=/usr/local/sc-gpfs/sbin/wait-for-all_local-mounts.sh TimeoutStartSec=200 ------------------------------------------------------------- The script itself is not doing much: ------------------------------------------------------------- #!/bin/bash # # wait until all _local_ gpfs filesystems are mounted. It ignored # filesystems where mmlsfs -A does not report "yes". # # returns 0 if all fs are mounted (or none are found in gpfs configuration) # returns non-0 otherwise # wait for max. TIMEOUT seconds TIMEOUT=180 # leading space is required! FS=" $(/usr/lpp/mmfs/bin/mmlsfs all_local -Y 2>/dev/null | grep :automaticMountOption:yes: | cut -d: -f7 | xargs; exit ${PIPESTATUS[0]})" # RC=1 and no output means there are no such filesystems configured in GPFS [ $? -eq 1 ] && [ "$FS" = " " ] && exit 0 # uncomment this line for testing #FS="$FS gpfsdummy" while [ $TIMEOUT -gt 0 ]; do for fs in ${FS}; do if findmnt $fs -n &>/dev/null; then FS=${FS/ $fs/} continue 2; fi done [ -z "${FS// /}" ] && break (( TIMEOUT -= 5 )) sleep 5 done if [ -z "${FS// /}" ]; then exit 0 else echo >&2 "ERROR: filesystem(s) not found in time:${FS}" exit 2 fi -------------------------------------------------- This works without problems on _most_ of our clusters. However, not on all. Some of them show what I believe is a race condition and fail to startup after a reboot: ---------------------------------------------------------------------- # journalctl -u gpfs -- Logs begin at Fri 2020-04-24 17:11:26 CEST, end at Tue 2020-04-28 12:47:34 CEST. -- Apr 24 17:12:13 myhost systemd[1]: Starting General Parallel File System... Apr 24 17:12:17 myhost mmfs[5720]: [X] Cannot open configuration file /var/mmfs/gen/mmfs.cfg. Apr 24 17:13:44 myhost systemd[1]: gpfs.service start-post operation timed out. Stopping. Apr 24 17:13:44 myhost mmremote[8966]: Shutting down! Apr 24 17:13:48 myhost mmremote[8966]: Unloading modules from /lib/modules/3.10.0-1062.18.1.el7.x86_64/extra Apr 24 17:13:48 myhost mmremote[8966]: Unloading module mmfs26 Apr 24 17:13:48 myhost mmremote[8966]: Unloading module mmfslinux Apr 24 17:13:48 myhost systemd[1]: Failed to start General Parallel File System. Apr 24 17:13:48 myhost systemd[1]: Unit gpfs.service entered failed state. Apr 24 17:13:48 myhost systemd[1]: gpfs.service failed. ---------------------------------------------------------------------- The mmfs.log shows a bit more: ---------------------------------------------------------------------- # less /var/adm/ras/mmfs.log.previous 2020-04-24_17:12:14.609+0200: runmmfs starting (4254) 2020-04-24_17:12:14.622+0200: [I] Removing old /var/adm/ras/mmfs.log.* files: 2020-04-24_17:12:14.658+0200: runmmfs: [I] Unloading modules from /lib/modules/3.10.0-1062.18.1.el7.x86_64/extra 2020-04-24_17:12:14.692+0200: runmmfs: [I] Unloading module mmfs26 2020-04-24_17:12:14.901+0200: runmmfs: [I] Unloading module mmfslinux 2020-04-24_17:12:15.018+0200: runmmfs: [I] Unloading module tracedev 2020-04-24_17:12:15.057+0200: runmmfs: [I] Loading modules from /lib/modules/3.10.0-1062.18.1.el7.x86_64/extra Module Size Used by mmfs26 2657452 0 mmfslinux 809734 1 mmfs26 tracedev 48618 2 mmfs26,mmfslinux 2020-04-24_17:12:16.720+0200: Node rebooted. Starting mmautoload... 2020-04-24_17:12:17.011+0200: [I] This node has a valid standard license 2020-04-24_17:12:17.011+0200: [I] Initializing the fast condition variables at 0x5561DFC365C0 ... 2020-04-24_17:12:17.011+0200: [I] mmfsd initializing. {Version: 5.0.4.2 Built: Jan 27 2020 12:13:06} ... 2020-04-24_17:12:17.011+0200: [I] Cleaning old shared memory ... 2020-04-24_17:12:17.012+0200: [I] First pass parsing mmfs.cfg ... 2020-04-24_17:12:17.013+0200: [X] Cannot open configuration file /var/mmfs/gen/mmfs.cfg. 2020-04-24_17:12:20.667+0200: mmautoload: Starting GPFS ... 2020-04-24_17:13:44.846+0200: mmremote: Initiating GPFS shutdown ... 2020-04-24_17:13:47.861+0200: mmremote: Starting the mmsdrserv daemon ... 2020-04-24_17:13:47.955+0200: mmremote: Unloading GPFS kernel modules ... 2020-04-24_17:13:48.165+0200: mmremote: Completing GPFS shutdown ... -------------------------------------------------------------------------- Starting the gpfs service again manually then works without problems. Interestingly the missing mmfs.cfg _is there_ after the shutdown, it gets created shortly after the failure. That's why I am assuming a race condition: -------------------------------------------------------------------------- # stat /var/mmfs/gen/mmfs.cfg File: ?/var/mmfs/gen/mmfs.cfg? Size: 408 Blocks: 8 IO Block: 4096 regular file Device: fd00h/64768d Inode: 268998265 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Context: system_u:object_r:var_t:s0 Access: 2020-04-27 17:12:19.801060073 +0200 Modify: 2020-04-24 17:12:17.617823441 +0200 Change: 2020-04-24 17:12:17.659823405 +0200 Birth: - -------------------------------------------------------------------------- Now, the interesting part: - removing the ExecStartPost script makes the issue vanish. Reboot is always startign gpfs successfully - reducing the ExecStartPost to simply one line ("exit 0") makes the issue stay. gpfs startup always fails. Unfortunately IBM is refusing support because "the script is not coming with gpfs". So I am searching for a solution that makes the script work on those servers again. Or a better way to wait for all local gpfs mounts being ready. Has anyone written something like that already? Thank you, Uli -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 From stockf at us.ibm.com Tue Apr 28 12:30:38 2020 From: stockf at us.ibm.com (Frederick Stock) Date: Tue, 28 Apr 2020 11:30:38 +0000 Subject: [gpfsug-discuss] wait for mount during gpfs startup In-Reply-To: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de> References: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de> Message-ID: An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Tue Apr 28 12:30:38 2020 From: stockf at us.ibm.com (Frederick Stock) Date: Tue, 28 Apr 2020 11:30:38 +0000 Subject: [gpfsug-discuss] wait for mount during gpfs startup In-Reply-To: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de> References: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de> Message-ID: An HTML attachment was scrubbed... URL: From vpuvvada at in.ibm.com Tue Apr 28 12:37:24 2020 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Tue, 28 Apr 2020 17:07:24 +0530 Subject: [gpfsug-discuss] Tuning Spectrum Scale AFM for stability? In-Reply-To: <239358449.52194.1588055677577@privateemail.com> References: <239358449.52194.1588055677577@privateemail.com> Message-ID: Hi, What is lock down of AFM fileset ? Are the messages in requeued state and AFM won't replicate any data ? I would recommend opening a ticket by collecting the logs and internaldump from the gateway node when the replication is stuck. You can also try increasing the value of afmAsyncOpWaitTimeout option and see if this solves the issue. mmchconfig afmAsyncOpWaitTimeout=3600 -i ~Venkat (vpuvvada at in.ibm.com) From: Andi Christiansen To: "gpfsug-discuss at spectrumscale.org" Date: 04/28/2020 12:04 PM Subject: [EXTERNAL] [gpfsug-discuss] Tuning Spectrum Scale AFM for stability? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, Can anyone share some thoughts on how to tune AFM for stability? at the moment we have ok performance between our sites (5-8Gbits with 34ms latency) but we encounter a lock down of the cache fileset from week to week, which was day to day before we tuned below settings.. is there any way to tune AFM further i haven't found ? Cache Site only: TCP Settings: sunrpc.tcp_slot_table_entries = 128 Home and Cache: AFM / GPFS Settings: maxBufferDescs=163840 afmHardMemThreshold=25G afmMaxWriteMergeLen=30G Cache fileset: Attributes for fileset AFMFILESET: ================================ Status Linked Path /mnt/fs02/AFMFILESET Id 1 Root inode 524291 Parent Id 0 Created Tue Apr 14 15:57:43 2020 Comment Inode space 1 Maximum number of inodes 10000384 Allocated inodes 10000384 Permission change flag chmodAndSetacl afm-associated Yes Target nfs://DK_VPN/mnt/fs01/AFMFILESET Mode single-writer File Lookup Refresh Interval 30 (default) File Open Refresh Interval 30 (default) Dir Lookup Refresh Interval 60 (default) Dir Open Refresh Interval 60 (default) Async Delay 15 (default) Last pSnapId 0 Display Home Snapshots no Number of Read Threads per Gateway 64 Parallel Read Chunk Size 128 Parallel Read Threshold 1024 Number of Gateway Flush Threads 48 Prefetch Threshold 0 (default) Eviction Enabled yes (default) Parallel Write Threshold 1024 Parallel Write Chunk Size 128 Number of Write Threads per Gateway 16 IO Flags 0 (default) mmfsadm dump afm: AFM Gateway: RpcQLen: 0 maxPoolSize: 4294967295 QOF: 0 MaxOF: 131072 readThLimit 128 minIOBuf 1048576 maxIOBuf 1073741824 msgMaxWriteSize 2147483648 readBypassThresh 67108864 QLen: 0 QMem: 0 SoftQMem: 10737418240 HardQMem 26843545600 Ping thread: Started Fileset: AFMFILESET 1 (fs02) mode: single-writer queue: Normal MDS: QMem 0 CTL 577 home: DK_VPN homeServer: 10.110.5.11 proto: nfs port: 2049 lastCmd: 16 handler: Mounted Dirty refCount: 1 queueTransfer: state: Idle senderVerified: 0 receiverVerified: 1 terminate: 0 psnapWait: 0 remoteAttrs: AsyncLookups 0 tsfindinode: success 0 failed 0 totalTime 0.0 avgTime 0,000000 maxTime 0.0 queue: delay 15 QLen 0+0 flushThds 0 maxFlushThds 48 numExec 8772518 qfs 0 iwo 0 err 78 handlerCreateTime : 2020-04-27_11:14:57.415+0200 numCreateSnaps : 0 InflightAsyncLookups 0 lastReplayTime : 2020-04-28_07:22:32.415+0200 lastSyncTime : 2020-04-27_15:09:57.415+0200 i/o: readBuf: 33554432 writeBuf: 2097152 sparseReadThresh: 134217728 pReadThreads 64 i/o: pReadChunkSize 33554432 pReadThresh: 1073741824 pWriteThresh: 1073741824 i/o: prefetchThresh 0 (Prefetch) Mnt status: 0:0 1:0 2:0 3:0 Export Map: 10.110.5.10/ 10.110.5.11/ 10.110.5.12/ 10.110.5.13/ Priority Queue: Empty (state: Active) Normal Queue: Empty (state: Active) Cluster Config Cache: maxFilesToCache 131072 maxStatCache 524288 afmDIO 2 afmIOFlags 4096 maxReceiverThreads 32 afmNumReadThreads 64 afmNumWriteThreads 8 afmHardMemThreshold 26843545600 maxBufferDescs 163840 afmMaxWriteMergeLen 32212254720 workerThreads 1024 The entries in the gpfs log states "AFM: Home is taking longer to respond..." but its only AFM and the Cache AFM fileset which enteres a locked state. we have the same NFS exports from home mounted on the same gateway nodes to check when a file is transferred and they are all ok while the AFM lock is happening. a simple gpfs restart of the AFM Master node is enough to make AFM restart and continue for another week.. The home target is exported through CES NFS from 4 CES nodes and a map is created at the Cache site to utilize the ParallelWrites feature. If there is anyone sitting around with some ideas/knowledge on how to tune this further for more stability then i would be happy if you could share your thoughts about it! :-) Many Thanks in Advance! Andi Christiansen _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=-XbtU1ILcqI_bUurDD3j1j-oqGszcNZAbQVIhQ5EZOs&s=IjrGy-VdY1cuNfy0bViEykWMEVDax7_xvrMdRhQ2QkM&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Tue Apr 28 12:38:01 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 28 Apr 2020 12:38:01 +0100 Subject: [gpfsug-discuss] wait for mount during gpfs startup In-Reply-To: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de> References: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de> Message-ID: <277171dd-9297-98e4-1e18-352fb49ef96f@strath.ac.uk> On 28/04/2020 11:57, Ulrich Sibiller wrote: > Hi, > > when the gpfs systemd service returns from startup the filesystems are > usually not mounted. So having another service depending on gpfs is not > feasible if you require the filesystem(s). > > Therefore we have added a script to the systemd gpfs service that waits > for all local gpfs filesystems being mounted. We have added that script > via ExecStartPost: > Yuck, and double yuck. There are many things you can say about systemd (and I have a choice few) but one of them is that it makes this sort of hackery obsolete. At least that is one of it goals. A systemd way to do it would be via one or more helper units. So lets assume your GPFS file system is mounted on /gpfs, then create a file called ismounted.txt on it and then create a unit called say gpfs_mounted.target that looks like # gpfs_mounted.target [Unit] TimeoutStartSec=infinity ConditionPathExists=/gpfs/ismounted.txt ExecStart=/usr/bin/sleep 10 RemainAfterExit=yes Then the main unit gets Wants=gpfs_mounted.target After=gpfs_mounted.target If you are using scripts in systemd you are almost certainly doing it wrong :-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From juergen.hannappel at desy.de Tue Apr 28 12:55:50 2020 From: juergen.hannappel at desy.de (Hannappel, Juergen) Date: Tue, 28 Apr 2020 13:55:50 +0200 (CEST) Subject: [gpfsug-discuss] wait for mount during gpfs startup In-Reply-To: <277171dd-9297-98e4-1e18-352fb49ef96f@strath.ac.uk> References: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de> <277171dd-9297-98e4-1e18-352fb49ef96f@strath.ac.uk> Message-ID: <792994589.3526899.1588074950106.JavaMail.zimbra@desy.de> Hi, a gpfs.mount target should be automatically created at boot by the systemd-fstab-generator from the fstab entry, so no need with hackery like ismountet.txt... ----- Original Message ----- > From: "Jonathan Buzzard" > To: gpfsug-discuss at spectrumscale.org > Sent: Tuesday, 28 April, 2020 13:38:01 > Subject: Re: [gpfsug-discuss] wait for mount during gpfs startup > Yuck, and double yuck. There are many things you can say about systemd > (and I have a choice few) but one of them is that it makes this sort of > hackery obsolete. At least that is one of it goals. > > A systemd way to do it would be via one or more helper units. So lets > assume your GPFS file system is mounted on /gpfs, then create a file > called ismounted.txt on it and then create a unit called say > gpfs_mounted.target that looks like > > > # gpfs_mounted.target > [Unit] > TimeoutStartSec=infinity > ConditionPathExists=/gpfs/ismounted.txt > ExecStart=/usr/bin/sleep 10 > RemainAfterExit=yes > > Then the main unit gets > > Wants=gpfs_mounted.target > After=gpfs_mounted.target > > If you are using scripts in systemd you are almost certainly doing it > wrong :-) > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From carlz at us.ibm.com Tue Apr 28 13:10:56 2020 From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com) Date: Tue, 28 Apr 2020 12:10:56 +0000 Subject: [gpfsug-discuss] wait for mount during gpfs startup (Ulrich Sibiller) Message-ID: There?s an RFE related to this: RFE 125955 (https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=125955) I recommend that people add their votes and comments there as well as discussing it here in the UG. Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com [signature_1027147421] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 69558 bytes Desc: image001.png URL: From andi at christiansen.xxx Tue Apr 28 13:25:37 2020 From: andi at christiansen.xxx (Andi Christiansen) Date: Tue, 28 Apr 2020 14:25:37 +0200 (CEST) Subject: [gpfsug-discuss] Tuning Spectrum Scale AFM for stability? In-Reply-To: References: <239358449.52194.1588055677577@privateemail.com> Message-ID: <467674858.57941.1588076737138@privateemail.com> Hi Venkat, The AFM fileset becomes totally unresponsive from all nodes within the cluster and the only way to resolve it is to do a "mmshutdown" and wait 2 mins, then "mmshutdown" again as it cannot really do it the first time.. and then a "mmstartup" then all is back to normal and AFM is stopped and can be started again for another week or so.. mmafmctl stop -j will just hang endless.. i will try to set that value and see if that does anything for us :) Thanks! Best Regards Andi Christiansen > On April 28, 2020 1:37 PM Venkateswara R Puvvada wrote: > > > Hi, > > What is lock down of AFM fileset ? Are the messages in requeued state and AFM won't replicate any data ? I would recommend opening a ticket by collecting the logs and internaldump from the gateway node when the replication is stuck. > > You can also try increasing the value of afmAsyncOpWaitTimeout option and see if this solves the issue. > > mmchconfig afmAsyncOpWaitTimeout=3600 -i > > ~Venkat (vpuvvada at in.ibm.com) > > > > From: Andi Christiansen > To: "gpfsug-discuss at spectrumscale.org" > Date: 04/28/2020 12:04 PM > Subject: [EXTERNAL] [gpfsug-discuss] Tuning Spectrum Scale AFM for stability? > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > --------------------------------------------- > > > > Hi All, > > Can anyone share some thoughts on how to tune AFM for stability? at the moment we have ok performance between our sites (5-8Gbits with 34ms latency) but we encounter a lock down of the cache fileset from week to week, which was day to day before we tuned below settings.. is there any way to tune AFM further i haven't found ? > > > Cache Site only: > TCP Settings: > sunrpc.tcp_slot_table_entries = 128 > > > Home and Cache: > AFM / GPFS Settings: > maxBufferDescs=163840 > afmHardMemThreshold=25G > afmMaxWriteMergeLen=30G > > > Cache fileset: > Attributes for fileset AFMFILESET: > ================================ > Status Linked > Path /mnt/fs02/AFMFILESET > Id 1 > Root inode 524291 > Parent Id 0 > Created Tue Apr 14 15:57:43 2020 > Comment > Inode space 1 > Maximum number of inodes 10000384 > Allocated inodes 10000384 > Permission change flag chmodAndSetacl > afm-associated Yes > Target nfs://DK_VPN/mnt/fs01/AFMFILESET > Mode single-writer > File Lookup Refresh Interval 30 (default) > File Open Refresh Interval 30 (default) > Dir Lookup Refresh Interval 60 (default) > Dir Open Refresh Interval 60 (default) > Async Delay 15 (default) > Last pSnapId 0 > Display Home Snapshots no > Number of Read Threads per Gateway 64 > Parallel Read Chunk Size 128 > Parallel Read Threshold 1024 > Number of Gateway Flush Threads 48 > Prefetch Threshold 0 (default) > Eviction Enabled yes (default) > Parallel Write Threshold 1024 > Parallel Write Chunk Size 128 > Number of Write Threads per Gateway 16 > IO Flags 0 (default) > > > mmfsadm dump afm: > AFM Gateway: > RpcQLen: 0 maxPoolSize: 4294967295 QOF: 0 MaxOF: 131072 > readThLimit 128 minIOBuf 1048576 maxIOBuf 1073741824 msgMaxWriteSize 2147483648 > readBypassThresh 67108864 > QLen: 0 QMem: 0 SoftQMem: 10737418240 HardQMem 26843545600 > Ping thread: Started > Fileset: AFMFILESET 1 (fs02) > mode: single-writer queue: Normal MDS: QMem 0 CTL 577 > home: DK_VPN homeServer: 10.110.5.11 proto: nfs port: 2049 lastCmd: 16 > handler: Mounted Dirty refCount: 1 > queueTransfer: state: Idle senderVerified: 0 receiverVerified: 1 terminate: 0 psnapWait: 0 > remoteAttrs: AsyncLookups 0 tsfindinode: success 0 failed 0 totalTime 0.0 avgTime 0,000000 maxTime 0.0 > queue: delay 15 QLen 0+0 flushThds 0 maxFlushThds 48 numExec 8772518 qfs 0 iwo 0 err 78 > handlerCreateTime : 2020-04-27_11:14:57.415+0200 numCreateSnaps : 0 InflightAsyncLookups 0 > lastReplayTime : 2020-04-28_07:22:32.415+0200 lastSyncTime : 2020-04-27_15:09:57.415+0200 > i/o: readBuf: 33554432 writeBuf: 2097152 sparseReadThresh: 134217728 pReadThreads 64 > i/o: pReadChunkSize 33554432 pReadThresh: 1073741824 pWriteThresh: 1073741824 > i/o: prefetchThresh 0 (Prefetch) > Mnt status: 0:0 1:0 2:0 3:0 > Export Map: 10.110.5.10/ 10.110.5.11/ 10.110.5.12/ 10.110.5.13/ > Priority Queue: Empty (state: Active) > Normal Queue: Empty (state: Active) > > > Cluster Config Cache: > maxFilesToCache 131072 > maxStatCache 524288 > afmDIO 2 > afmIOFlags 4096 > maxReceiverThreads 32 > afmNumReadThreads 64 > afmNumWriteThreads 8 > afmHardMemThreshold 26843545600 > maxBufferDescs 163840 > afmMaxWriteMergeLen 32212254720 > workerThreads 1024 > > > The entries in the gpfs log states "AFM: Home is taking longer to respond..." but its only AFM and the Cache AFM fileset which enteres a locked state. we have the same NFS exports from home mounted on the same gateway nodes to check when a file is transferred and they are all ok while the AFM lock is happening. a simple gpfs restart of the AFM Master node is enough to make AFM restart and continue for another week.. > > > The home target is exported through CES NFS from 4 CES nodes and a map is created at the Cache site to utilize the ParallelWrites feature. > > > If there is anyone sitting around with some ideas/knowledge on how to tune this further for more stability then i would be happy if you could share your thoughts about it! :-) > > > Many Thanks in Advance! > Andi Christiansen > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From skylar2 at uw.edu Tue Apr 28 14:57:36 2020 From: skylar2 at uw.edu (Skylar Thompson) Date: Tue, 28 Apr 2020 06:57:36 -0700 Subject: [gpfsug-discuss] wait for mount during gpfs startup In-Reply-To: References: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de> Message-ID: <20200428135736.3zqcvvupj2ipvjfw@illiuin> We use callbacks successfully to ensure Linux auditd rules are only loaded after GPFS is mounted. It was easy to setup, and there's very fine-grained events that you can trigger on: https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1adm_mmaddcallback.htm On Tue, Apr 28, 2020 at 11:30:38AM +0000, Frederick Stock wrote: > Have you looked a the mmaddcallback command and specifically the file system mount callbacks? -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From novosirj at rutgers.edu Tue Apr 28 17:33:34 2020 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Tue, 28 Apr 2020 16:33:34 +0000 Subject: [gpfsug-discuss] wait for mount during gpfs startup In-Reply-To: <792994589.3526899.1588074950106.JavaMail.zimbra@desy.de> References: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de> <277171dd-9297-98e4-1e18-352fb49ef96f@strath.ac.uk> <792994589.3526899.1588074950106.JavaMail.zimbra@desy.de> Message-ID: <2F49D93E-18CA-456D-9815-ACB581A646B7@rutgers.edu> Has anyone confirmed this? At one point, I mucked around with this somewhat endlessly to try to get something sane and systemd-based to work and ultimately surrendered and inserted a 30 second delay. I didn?t try the ?check for the presence of a file? thing as I?m allergic to that sort of thing (at least more allergic than I am to a time-based delay). I believe everything that I tried happens before the mount is complete. -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' > On Apr 28, 2020, at 7:55 AM, Hannappel, Juergen wrote: > > Hi, > a gpfs.mount target should be automatically created at boot by the > systemd-fstab-generator from the fstab entry, so no need with hackery like > ismountet.txt... > > > ----- Original Message ----- >> From: "Jonathan Buzzard" >> To: gpfsug-discuss at spectrumscale.org >> Sent: Tuesday, 28 April, 2020 13:38:01 >> Subject: Re: [gpfsug-discuss] wait for mount during gpfs startup > >> Yuck, and double yuck. There are many things you can say about systemd >> (and I have a choice few) but one of them is that it makes this sort of >> hackery obsolete. At least that is one of it goals. >> >> A systemd way to do it would be via one or more helper units. So lets >> assume your GPFS file system is mounted on /gpfs, then create a file >> called ismounted.txt on it and then create a unit called say >> gpfs_mounted.target that looks like >> >> >> # gpfs_mounted.target >> [Unit] >> TimeoutStartSec=infinity >> ConditionPathExists=/gpfs/ismounted.txt >> ExecStart=/usr/bin/sleep 10 >> RemainAfterExit=yes >> >> Then the main unit gets >> >> Wants=gpfs_mounted.target >> After=gpfs_mounted.target >> >> If you are using scripts in systemd you are almost certainly doing it >> wrong :-) >> >> JAB. >> >> -- >> Jonathan A. Buzzard Tel: +44141-5483420 >> HPC System Administrator, ARCHIE-WeSt. >> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From novosirj at rutgers.edu Tue Apr 28 18:32:25 2020 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Tue, 28 Apr 2020 17:32:25 +0000 Subject: [gpfsug-discuss] wait for mount during gpfs startup (Ulrich Sibiller) In-Reply-To: References: Message-ID: I?ve also voted and commented on the ticket, but I?ll say this here: If the amount of time I spent on this alone (and I like to think I?m pretty good with this sort of thing, and am somewhat of a systemd evangelist when the opportunity presents itself), this has caused a lot of people a lot of pain ? including time spent when their kludge to make this work causes some other problem, or having to reboot nodes in a much more manual way at times to ensure one of these nodes doesn?t dump work while it has no FS, etc. -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' > On Apr 28, 2020, at 8:10 AM, Carl Zetie - carlz at us.ibm.com wrote: > > There?s an RFE related to this: RFE 125955 (https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=125955) > > I recommend that people add their votes and comments there as well as discussing it here in the UG. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From chair at spectrumscale.org Wed Apr 29 22:29:34 2020 From: chair at spectrumscale.org (Simon Thompson (Spectrum Scale User Group Chair)) Date: Wed, 29 Apr 2020 22:29:34 +0100 Subject: [gpfsug-discuss] THINK Virtual User Group Day Message-ID: <5BE5B210-5FEE-45E0-AC0D-1B184B5B8E45@spectrumscale.org> Hi All, As part of IBM?s THINK digital event, there will be a virtual user group day. This isn?t an SSUG event, though we?ve been involved in some of the discussion about the topics for the event. Three of the four Storage sessions are focussed on Spectrum Scale. For storage this will be taking place on May 19th. Details of how to register for this event and the planned sessions are below (though I guess are still subject to change). Separately to this, the SSUG organisers are still in discussion about how we might present some sort of digital SSUG event, it won?t be a half/full day of talks, but likely a series of talks ? but we?re still working through the details with Ulf and the IBM team about how it might work. And if you are interested in THINK, this is free to register for this year as a digital only event https://www.ibm.com/events/think ? I promise this is my only reference to THINK ? Simon The registration site for the user group day is https://ibm-usergroups.bemyapp.com/ Storage Session 1 Title IBM Spectrum Scale: Use Cases and Field Lessons-learned with Kubernetes and OpenShift Abstract IBM Spectrum Scale user group leaders will discuss how to deploy IBM Spectrum Scale using Kubernetes and OpenShift, persistent volumes, IBM Storage Enabler for Containers, Kubernetes FlexVolume Drivers and IBM Spectrum Connect. We'll review real-world IBM Spectrum Scale use cases including advanced driver assistance systems (ADAS), cloud service providers (CSP), dev/test and multi-cloud. We'll also review most often-requested client topics including unsupported CSI platforms, security, multi-tenancy and how to deploy Spectrum Scale in heterogenous environments such as x86, IBM Power, and IBM Z by using IBM Cloud Private and OpenShift. Finally we'll describe IBM resources such as regional storage competency centers, training, testing labs and IBM Lab Services. Presenter Harald Seipp, Senior Technical Staff Member, Center of Excellence for Cloud Storage Storage Session 2 Title How to Efficiently Manage your Hadoop and Analytics Workflow with IBM Spectrum Scale Abstract This in-depth technical talk will compare traditional Hadoop vs. IBM Spectrum Scale through Hadoop Distributed File System (HDFS) on IBM Spectrum Scale, HDFS storage tiering & federation, HDFS backup, using IBM Spectrum Scale as an ingest tier, next generation workloads, disaster recovery and fault-tolerance using a single stretch cluster or multiple clusters using active file management (AFM), as well as HDFS integration within Cluster Export Services (CES). Presenter Andreas Koeninger, IBM Spectrum Scale Big Data and Analytics Storage Session 3 Title IBM Spectrum Scale: How to enable AI Workloads with OpenShift and IBM Spectrum Scale Abstract IBM Spectrum Scale user group leaders will deliver a in-depth technical presentation covering the enterprise AI data pipeline from ingest to insights, how to manage workloads at scale, how to integrate OpenShift 4.x and IBM Spectrum Scale 5.0.4.1, as well as preparing and installing the IBM Spectrum Scale CSI driver in OpenShift. We will also cover Kubernetes/OpenShift persistent volumes and use cases for provisioning with IBM Spectrum Scale CSI for AI workloads. Finally we will feature a demo of IBM Spectrum Scale CSI and TensorFlow in OpenShift 4.x. Presenters Gero Schmidt, IBM Spectrum Scale Development, Big Data Analytics Solutions Przemyslaw Podfigurny, IBM Spectrum Scale Development, AI/ML Big Data and Analytics Storage Session 4 Title Journey to Modern Data Protection for a Large Manufacturing Client Abstract In this webinar, we will discuss how industrial manufacturing organizations are addressing data protection. We will look at why holistic data protection is a critical infrastructure component and how modernization can provide a foundation for the future. We will share how customers are leveraging the IBM Spectrum Protect portfolio to address their IT organization's data protection, business continuity with software-defined data protection solutions. We will discuss various applications including data reuse, as well as providing instant access to data which can help an organization be more agile and reduce downtime. Presenters Adam Young, Russell Dwire -------------- next part -------------- An HTML attachment was scrubbed... URL: From u.sibiller at science-computing.de Thu Apr 30 11:50:27 2020 From: u.sibiller at science-computing.de (Ulrich Sibiller) Date: Thu, 30 Apr 2020 12:50:27 +0200 Subject: [gpfsug-discuss] wait for mount during gpfs startup In-Reply-To: <20200428135736.3zqcvvupj2ipvjfw@illiuin> References: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de> <20200428135736.3zqcvvupj2ipvjfw@illiuin> Message-ID: Am 28.04.20 um 15:57 schrieb Skylar Thompson: >> Have you looked a the mmaddcallback command and specifically the file system mount callbacks? > We use callbacks successfully to ensure Linux auditd rules are only loaded > after GPFS is mounted. It was easy to setup, and there's very fine-grained > events that you can trigger on: Thanks. But how do set this up for a systemd service? Disable the dependent service and start it from the callback? Create some kind of state file in the callback and let the dependent systemd service check that flag file in a busy loop? Use inotify for the flag file? Uli -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 From u.sibiller at science-computing.de Thu Apr 30 11:50:39 2020 From: u.sibiller at science-computing.de (Ulrich Sibiller) Date: Thu, 30 Apr 2020 12:50:39 +0200 Subject: [gpfsug-discuss] wait for mount during gpfs startup In-Reply-To: <792994589.3526899.1588074950106.JavaMail.zimbra@desy.de> References: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de> <277171dd-9297-98e4-1e18-352fb49ef96f@strath.ac.uk> <792994589.3526899.1588074950106.JavaMail.zimbra@desy.de> Message-ID: <4c9f3acc-cfc7-05a5-eca5-2054c67c0cc4@science-computing.de> Am 28.04.20 um 13:55 schrieb Hannappel, Juergen: > a gpfs.mount target should be automatically created at boot by the > systemd-fstab-generator from the fstab entry, so no need with hackery like > ismountet.txt... A generic gpfs.mount target does not seem to exist on my system. There are only specific mount targets for the mounted gpfs filesystems. So I'd need to individually configure each depend service on each system with the filesystem for wait for. My approach was more general in just waiting for all_local gpfs filesystems. So I can use the same configuration everywhere. Besides, I have once tested and found that these targets are not usable because of some oddities but unfortunately I don't remember details. But the outcome was my script from the initial post. Maybe it was that there's no automatic mount target for all_local, same problem as above. Uli -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 From u.sibiller at science-computing.de Thu Apr 30 12:14:07 2020 From: u.sibiller at science-computing.de (Ulrich Sibiller) Date: Thu, 30 Apr 2020 13:14:07 +0200 Subject: [gpfsug-discuss] wait for mount during gpfs startup In-Reply-To: <277171dd-9297-98e4-1e18-352fb49ef96f@strath.ac.uk> References: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de> <277171dd-9297-98e4-1e18-352fb49ef96f@strath.ac.uk> Message-ID: Am 28.04.20 um 13:38 schrieb Jonathan Buzzard: > Yuck, and double yuck. There are many things you can say about systemd > (and I have a choice few) but one of them is that it makes this sort of > hackery obsolete. At least that is one of it goals. > > A systemd way to do it would be via one or more helper units. So lets > assume your GPFS file system is mounted on /gpfs, then create a file > called ismounted.txt on it and then create a unit called say > gpfs_mounted.target that looks like > > > # gpfs_mounted.target > [Unit] > TimeoutStartSec=infinity > ConditionPathExists=/gpfs/ismounted.txt > ExecStart=/usr/bin/sleep 10 > RemainAfterExit=yes > > Then the main unit gets > > Wants=gpfs_mounted.target > After=gpfs_mounted.target > > If you are using scripts in systemd you are almost certainly doing it > wrong :-) Yes, that the right direction. But still not the way I'd like it to be. First, I don't really like the flag file stuff. Imagine the mess you'd create if multiple services would require flag files... Second, I am looking for an all_local target. That one cannot be solved using this approach, right? (same for all_remote or all) Uli -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 From scale at us.ibm.com Thu Apr 30 12:40:57 2020 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Thu, 30 Apr 2020 07:40:57 -0400 Subject: [gpfsug-discuss] wait for mount during gpfs startup In-Reply-To: References: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de><20200428135736.3zqcvvupj2ipvjfw@illiuin> Message-ID: I now better understand the functionality you were aiming to achieve. You want anything in systemd that is dependent on GPFS file systems being mounted to block until they are mounted. Currently we do not offer any such feature though as Carl Zetie noted there is an RFE for such functionality, RFE 125955 ( https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=125955 ). For the mmaddcallback what I was thinking could resolve your problem was for you to create a either a "startup" callback or "mount" callbacks for your file systems. I thought you could use those callbacks to track the file systems of interest and then use the appropriate means to integrate that information into the flow of systemd. I have never done this so perhaps it is not possible. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Ulrich Sibiller To: gpfsug-discuss at spectrumscale.org Date: 04/30/2020 06:57 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] wait for mount during gpfs startup Sent by: gpfsug-discuss-bounces at spectrumscale.org Am 28.04.20 um 15:57 schrieb Skylar Thompson: >> Have you looked a the mmaddcallback command and specifically the file system mount callbacks? > We use callbacks successfully to ensure Linux auditd rules are only loaded > after GPFS is mounted. It was easy to setup, and there's very fine-grained > events that you can trigger on: Thanks. But how do set this up for a systemd service? Disable the dependent service and start it from the callback? Create some kind of state file in the callback and let the dependent systemd service check that flag file in a busy loop? Use inotify for the flag file? Uli -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=KmkFZ30Ey3pB4QnhsP2vS2mmojVLAWGrIiStGaE0320&s=VHWoLbiq119iFhL724WAQwg4dSJ3KRNVSXnfrFBv9RQ&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From skylar2 at uw.edu Thu Apr 30 14:43:28 2020 From: skylar2 at uw.edu (Skylar Thompson) Date: Thu, 30 Apr 2020 06:43:28 -0700 Subject: [gpfsug-discuss] wait for mount during gpfs startup In-Reply-To: References: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de> <20200428135736.3zqcvvupj2ipvjfw@illiuin> Message-ID: <20200430134328.7qshqlrptw6hquls@illiuin> On Thu, Apr 30, 2020 at 12:50:27PM +0200, Ulrich Sibiller wrote: > Am 28.04.20 um 15:57 schrieb Skylar Thompson: > >> Have you looked a the mmaddcallback command and specifically the file system mount callbacks? > > > We use callbacks successfully to ensure Linux auditd rules are only loaded > > after GPFS is mounted. It was easy to setup, and there's very fine-grained > > events that you can trigger on: > > Thanks. But how do set this up for a systemd service? Disable the dependent service and start it > from the callback? Create some kind of state file in the callback and let the dependent systemd > service check that flag file in a busy loop? Use inotify for the flag file? In the pre-systemd days, I would say just disable the service and let the callback handle it. I do see your point, though, that you lose the other systemd ordering benefits if you start the service from the callback. Assuming you're still able to start the service via systemctl, I would probably just leave it disabled and let the callback handle it. In the case of auditd rules, it's not actually a service (just a command that needs to be run) so we didn't run into this specific problem. -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From andi at christiansen.xxx Wed Apr 1 10:04:56 2020 From: andi at christiansen.xxx (Andi Christiansen) Date: Wed, 1 Apr 2020 11:04:56 +0200 (CEST) Subject: [gpfsug-discuss] Enabling SSL/HTTPS/ on Object S3. Message-ID: <706418212.158040.1585731896422@privateemail.com> Hi, We are trying to enable S3 on the object protocol within scale but there seem to be little to no documentation to enable https endpoints for the S3 protocol? According to the documentation enabling S3 for the keystone server is possible with the mmuserauth command but when i try to run it as IBM have documented, it says that Object protocol is not correctly installed.. And yes it hasnt been configured yet.. The "mmobj swift base" command which is used to configure Object/S3 automatically includes the "mmuserauth" command without the ssl option enabled.. and then all endpoints will start with http:// I hope that anyone out there have a guide to do this ? or is able to explain how to set it up? Basically all i need is this: https://s3.something.com:8080 which points to the WAN ip of the CES cluster (already configured and ready) and endpoints like this: None | keystone | identity | True | public | https://cluster_domain:5000/ RegionOne | swift | object-store | True | public | https://cluster_domain:443/v1/AUTH_%(tenant_id)s RegionOne | swift | object-store | True | public | https://cluster_domain:8080/v1/AUTH_%(tenant_id)s if i manually add those endpoints and put my certificates in /etc/swift/ and update the config it says (SSL: Wrong_Version_Number). Here is output: C:\Users\Andi Christiansen>aws --endpoint-url https://WAN_IP/DOMAIN https://WAN :443 s3 ls SSL validation failed for https://WAN_IP/DOMAIN:443/ [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate (_ssl.c:1076) C:\Users\Andi Christiansen>aws --endpoint-url https://WAN_IP/DOMAIN:8080 s3 ls SSL validation failed for https://WAN_IP/DOMAIN:8080/ [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1076) its only port 8080 and 5000 that is allowed through the firewall, so i only tested with 443 to see if it gave another error as it is not allowed through and it did.. It works just fine when "mmobj swift base" is run normally and i only have http endpoints, then it is reachable from local network or WAN with no issues.. Thanks in advance! Best Regards Andi Christiansen -------------- next part -------------- An HTML attachment was scrubbed... URL: From smita.raut at in.ibm.com Wed Apr 1 10:52:44 2020 From: smita.raut at in.ibm.com (Smita J Raut) Date: Wed, 1 Apr 2020 15:22:44 +0530 Subject: [gpfsug-discuss] Enabling SSL/HTTPS/ on Object S3. In-Reply-To: <706418212.158040.1585731896422@privateemail.com> References: <706418212.158040.1585731896422@privateemail.com> Message-ID: Hi Andi, For object SSL configuration you need to reconfigure auth after "mmobj swift base". Instructions are here- https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1adm_configlocalauthssl.htm Some more info on object auth configuration- https://www.slideshare.net/SmitaRaut/ibm-spectrum-scale-authentication-for-object-deep-dive (Check slide 26) Thanks, Smita From: Andi Christiansen To: "gpfsug-discuss at spectrumscale.org" Date: 04/01/2020 02:35 PM Subject: [EXTERNAL] [gpfsug-discuss] Enabling SSL/HTTPS/ on Object S3. Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, We are trying to enable S3 on the object protocol within scale but there seem to be little to no documentation to enable https endpoints for the S3 protocol? According to the documentation enabling S3 for the keystone server is possible with the mmuserauth command but when i try to run it as IBM have documented, it says that Object protocol is not correctly installed.. And yes it hasnt been configured yet.. The "mmobj swift base" command which is used to configure Object/S3 automatically includes the "mmuserauth" command without the ssl option enabled.. and then all endpoints will start with http:// I hope that anyone out there have a guide to do this ? or is able to explain how to set it up? Basically all i need is this: https://s3.something.com:8080 which points to the WAN ip of the CES cluster (already configured and ready) and endpoints like this: None | keystone | identity | True | public | https://cluster_domain:5000/ RegionOne | swift | object-store | True | public | https://cluster_domain:443/v1/AUTH_%(tenant_id)s RegionOne | swift | object-store | True | public | https://cluster_domain:8080/v1/AUTH_%(tenant_id)s if i manually add those endpoints and put my certificates in /etc/swift/ and update the config it says (SSL: Wrong_Version_Number). Here is output: C:\Users\Andi Christiansen>aws --endpoint-url https://WAN_IP/DOMAIN:443 s3 ls SSL validation failed for https://WAN_IP/DOMAIN:443/ [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate (_ssl.c:1076) C:\Users\Andi Christiansen>aws --endpoint-url https://WAN_IP/DOMAIN:8080 s3 ls SSL validation failed for https://WAN_IP/DOMAIN:8080/ [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1076) its only port 8080 and 5000 that is allowed through the firewall, so i only tested with 443 to see if it gave another error as it is not allowed through and it did.. It works just fine when "mmobj swift base" is run normally and i only have http endpoints, then it is reachable from local network or WAN with no issues.. Thanks in advance! Best Regards Andi Christiansen _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=ZKPP3G6NR3aLNRqaXZWW90vDcvevU1hcxJA6_1Up8Ic&m=ZSHZbcegNHURIVsXPDASH5sTFwYAZYYLv-RnoaKNzxw&s=n1X6h1EYg8gdiHH8BFe4OYVQvIMSxoYXRMX3SC2IaBY&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From andi at christiansen.xxx Wed Apr 1 12:21:37 2020 From: andi at christiansen.xxx (Andi Christiansen) Date: Wed, 1 Apr 2020 13:21:37 +0200 (CEST) Subject: [gpfsug-discuss] Enabling SSL/HTTPS/ on Object S3. In-Reply-To: References: <706418212.158040.1585731896422@privateemail.com> Message-ID: <1057409925.160136.1585740097841@privateemail.com> Hi Smita, Thanks for your reply. i have tried what you suggested. mmobj swift base ran fine. but after i have deleted the userauth and try to set it up again with ks-ssl enabled it just hangs: # mmuserauth service create --data-access-method object --type local --enable-ks-ssl still waiting for it to finish, 15 mins now.. :) Best Regards Andi Christiansen > On April 1, 2020 11:52 AM Smita J Raut wrote: > > > Hi Andi, > > For object SSL configuration you need to reconfigure auth after "mmobj swift base". Instructions are here- > https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1adm_configlocalauthssl.htm https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1adm_configlocalauthssl.htm > > Some more info on object auth configuration- > https://www.slideshare.net/SmitaRaut/ibm-spectrum-scale-authentication-for-object-deep-dive https://www.slideshare.net/SmitaRaut/ibm-spectrum-scale-authentication-for-object-deep-dive (Check slide 26) > > Thanks, > Smita > > > > From: Andi Christiansen > To: "gpfsug-discuss at spectrumscale.org" > Date: 04/01/2020 02:35 PM > Subject: [EXTERNAL] [gpfsug-discuss] Enabling SSL/HTTPS/ on Object S3. > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > --------------------------------------------- > > > > Hi, > > We are trying to enable S3 on the object protocol within scale but there seem to be little to no documentation to enable https endpoints for the S3 protocol? > > According to the documentation enabling S3 for the keystone server is possible with the mmuserauth command but when i try to run it as IBM have documented, it says that Object protocol is not correctly installed.. And yes it hasnt been configured yet.. > > The "mmobj swift base" command which is used to configure Object/S3 automatically includes the "mmuserauth" command without the ssl option enabled.. and then all endpoints will start with http:// > > > I hope that anyone out there have a guide to do this ? or is able to explain how to set it up? > > > Basically all i need is this: > > https://s3.something.com:8080 https://s3.something.com:8080 which points to the WAN ip of the CES cluster (already configured and ready) > > and endpoints like this: > > None | keystone | identity | True | public | https://cluster_domain:5000/ https://cluster_domain:5000/ > RegionOne | swift | object-store | True | public | https://cluster_domain:443/v1/AUTH_%(tenant_id)s > RegionOne | swift | object-store | True | public | https://cluster_domain:8080/v1/AUTH_%(tenant_id)s > > if i manually add those endpoints and put my certificates in /etc/swift/ and update the config it says (SSL: Wrong_Version_Number). Here is output: > > C:\Users\Andi Christiansen>aws --endpoint-url https://WAN_IP/DOMAIN https://WAN :443 s3 ls > SSL validation failed for https://WAN_IP/DOMAIN:443/ https://WAN_IP/DOMAIN:443/ [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate (_ssl.c:1076) > C:\Users\Andi Christiansen>aws --endpoint-url https://WAN_IP/DOMAIN:8080 https://WAN_IP/DOMAIN:8080 s3 ls > SSL validation failed for https://WAN_IP/DOMAIN:8080/ https://WAN_IP/DOMAIN:8080/ [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1076) > > > its only port 8080 and 5000 that is allowed through the firewall, so i only tested with 443 to see if it gave another error as it is not allowed through and it did.. > > > It works just fine when "mmobj swift base" is run normally and i only have http endpoints, then it is reachable from local network or WAN with no issues.. > > > > Thanks in advance! > > > Best Regards > Andi Christiansen _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Wed Apr 1 15:06:43 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 1 Apr 2020 15:06:43 +0100 Subject: [gpfsug-discuss] DSS-G dowloads Message-ID: <58b5e5f2-8da5-9b14-0dfb-acfe9cef0713@strath.ac.uk> I have just been trying to download the 2.4b release and I am not getting anywhere. A little investigation shows that lenovoesd.flexnetoperations.com does not resolve in the DNS. Not from work, not from home, not using 1.1.1.1 or 8.8.8.8 Anyone know what is going on? Has it moved and if so to where? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From S.J.Thompson at bham.ac.uk Wed Apr 1 15:40:30 2020 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Wed, 1 Apr 2020 14:40:30 +0000 Subject: [gpfsug-discuss] DSS-G dowloads In-Reply-To: <58b5e5f2-8da5-9b14-0dfb-acfe9cef0713@strath.ac.uk> References: <58b5e5f2-8da5-9b14-0dfb-acfe9cef0713@strath.ac.uk> Message-ID: <763B4054-02F2-483A-9465-3102AB5493E1@bham.ac.uk> It moved. We had email notifications about this ages ago. Accounts were created automatically for us for those on the contract admin role. 2.5c is latest release (5.0.4-1.6 or 4.2.3-18) Go to https://commercial.lenovo.com/ Simon Simon ?On 01/04/2020, 15:06, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" wrote: I have just been trying to download the 2.4b release and I am not getting anywhere. A little investigation shows that lenovoesd.flexnetoperations.com does not resolve in the DNS. Not from work, not from home, not using 1.1.1.1 or 8.8.8.8 Anyone know what is going on? Has it moved and if so to where? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jroche at lenovo.com Wed Apr 1 15:34:36 2020 From: jroche at lenovo.com (Jim Roche) Date: Wed, 1 Apr 2020 14:34:36 +0000 Subject: [gpfsug-discuss] [External] DSS-G dowloads In-Reply-To: <58b5e5f2-8da5-9b14-0dfb-acfe9cef0713@strath.ac.uk> References: <58b5e5f2-8da5-9b14-0dfb-acfe9cef0713@strath.ac.uk> Message-ID: Hi Jonathan, I don't think the site has moved. I'm investigating why it cannot be found and will let you know what is going on. Regards, Jim Jim Roche Head of Research Computing University Relations Manager Redwood, 3 Chineham Business Park, Crockford Lane Basingstoke Hampshire RG24 8WQ Lenovo UK +44 7702678579 jroche at lenovo.com ? Lenovo.com? Twitter?|?Instagram?|?Facebook?|?Linkedin?|?YouTube?|?Privacy? -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jonathan Buzzard Sent: 01 April 2020 15:07 To: gpfsug-discuss at spectrumscale.org Subject: [External] [gpfsug-discuss] DSS-G dowloads I have just been trying to download the 2.4b release and I am not getting anywhere. A little investigation shows that lenovoesd.flexnetoperations.com does not resolve in the DNS. Not from work, not from home, not using 1.1.1.1 or 8.8.8.8 Anyone know what is going on? Has it moved and if so to where? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From ncalimet at lenovo.com Wed Apr 1 15:46:32 2020 From: ncalimet at lenovo.com (Nicolas CALIMET) Date: Wed, 1 Apr 2020 14:46:32 +0000 Subject: [gpfsug-discuss] [External] DSS-G dowloads In-Reply-To: <58b5e5f2-8da5-9b14-0dfb-acfe9cef0713@strath.ac.uk> References: <58b5e5f2-8da5-9b14-0dfb-acfe9cef0713@strath.ac.uk> Message-ID: <477be93f0bc8411a8d8c31935db28a4f@lenovo.com> The old Lenovo ESD website is gone; retired some time ago. Please visit instead: https://commercial.lenovo.com FWIW the most current release is DSS-G 2.5c. Thanks -- Nicolas Calimet, PhD | HPC System Architect | Lenovo DCG | Meitnerstrasse 9, D-70563 Stuttgart, Germany | +49 71165690146 -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jonathan Buzzard Sent: Wednesday, April 1, 2020 16:07 To: gpfsug-discuss at spectrumscale.org Subject: [External] [gpfsug-discuss] DSS-G dowloads I have just been trying to download the 2.4b release and I am not getting anywhere. A little investigation shows that lenovoesd.flexnetoperations.com does not resolve in the DNS. Not from work, not from home, not using 1.1.1.1 or 8.8.8.8 Anyone know what is going on? Has it moved and if so to where? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathan.buzzard at strath.ac.uk Wed Apr 1 19:50:28 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 1 Apr 2020 19:50:28 +0100 Subject: [gpfsug-discuss] DSS-G dowloads In-Reply-To: <763B4054-02F2-483A-9465-3102AB5493E1@bham.ac.uk> References: <58b5e5f2-8da5-9b14-0dfb-acfe9cef0713@strath.ac.uk> <763B4054-02F2-483A-9465-3102AB5493E1@bham.ac.uk> Message-ID: On 01/04/2020 15:40, Simon Thompson wrote: > It moved. We had email notifications about this ages ago. Accounts > were created automatically for us for those on the contract admin > role. 2.5c is latest release (5.0.4-1.6 or 4.2.3-18) > You are right once I search my spam folder. Thanks a bunch Microsoft. I am still not convinced that are still not evil. They seem determined to put my CentOS security emails in the spam folder. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From jkavitsky at 23andme.com Fri Apr 3 23:25:33 2020 From: jkavitsky at 23andme.com (Jim Kavitsky) Date: Fri, 3 Apr 2020 15:25:33 -0700 Subject: [gpfsug-discuss] fast search for archivable data sets Message-ID: Hello everyone, I'm managing a low-multi-petabyte Scale filesystem with hundreds of millions of inodes, and I'm looking for the best way to locate archivable directories. For example, these might be directories where whose contents were greater than 5 or 10TB, and whose contents had atimes greater than two years. Has anyone found a great way to do this with a policy engine run? If not, is there another good way that anyone would recommend? Thanks in advance, Jim Kavitsky -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Sat Apr 4 00:45:18 2020 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Fri, 3 Apr 2020 19:45:18 -0400 Subject: [gpfsug-discuss] fast search for archivable data sets In-Reply-To: References: Message-ID: Hi Jim, If you never worked with policy rules before, you may want to start by building your nerves to it. In the /usr/lpp/mmfs/samples/ilm path you will find several examples of templates that you can use to play around. I would start with the 'list' rules first. Some of those templates are a bit complex, so here is one script that I use on a regular basis to detect files larger than 1MB (you can even exclude specific filesets): ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ dss-mgt1:/scratch/r/root/mmpolicyRules # cat mmpolicyRules-list-large /* A macro to abbreviate VARCHAR */ define([vc],[VARCHAR($1)]) /* Define three external lists */ RULE EXTERNAL LIST 'largefiles' EXEC '/gpfs/fs0/scratch/r/root/mmpolicyRules/mmpolicyExec-list' /* Generate a list of all files that have more than 1MB of space allocated. */ RULE 'r2' LIST 'largefiles' SHOW('-u' vc(USER_ID) || ' -s' || vc(FILE_SIZE)) /*FROM POOL 'system'*/ FROM POOL 'data' /*FOR FILESET('root')*/ WEIGHT(FILE_SIZE) WHERE KB_ALLOCATED > 1024 /* Files in special filesets, such as mmpolicyRules, are never moved or deleted */ RULE 'ExcSpecialFile' EXCLUDE FOR FILESET('mmpolicyRules','todelete','tapenode-stuff','toarchive') ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ And here is another to detect files not looked at for more than 6 months. I found more effective to use atime and ctime. You could combine this with the one above to detect file size as well. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ dss-mgt1:/scratch/r/root/mmpolicyRules # cat mmpolicyRules-list-atime-ctime-gt-6months /* A macro to abbreviate VARCHAR */ define([vc],[VARCHAR($1)]) /* Define three external lists */ RULE EXTERNAL LIST 'accessedfiles' EXEC '/gpfs/fs0/scratch/r/root/mmpolicyRules/mmpolicyExec-list' /* Generate a list of all files, directories, plus all other file system objects, like symlinks, named pipes, etc, accessed prior to a certain date AND that are not owned by root. Include the owner's id with each object and sort them by the owner's id */ /* Files in special filesets, such as mmpolicyRules, are never moved or deleted */ RULE 'ExcSpecialFile' EXCLUDE FOR FILESET ('scratch-root','todelete','root') RULE 'r5' LIST 'accessedfiles' DIRECTORIES_PLUS FROM POOL 'data' SHOW('-u' vc(USER_ID) || ' -a' || vc(ACCESS_TIME) || ' -c' || vc(CREATION_TIME) || ' -s ' || vc(FILE_SIZE)) WHERE (DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME) > 183) AND (DAYS(CURRENT_TIMESTAMP) - DAYS(CREATION_TIME) > 183) AND NOT USER_ID = 0 AND NOT (PATH_NAME LIKE '/gpfs/fs0/scratch/r/root/%') ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Note that both these scripts work on a system wide (or root fileset) basis, and will not give you specific directories, unless you run them several times on specific directories (not very efficient). To produce general lists per directory you would need to do some post processing on the lists, with 'awk' or some other scripting language. If you need some samples I can send you. And finally, you need to be more specific by what you mean by 'archivable'. Once you produce the list you can do several things with them or leverage the rules to actually execute things, such as move, delete, or hsm stuff. The /usr/lpp/mmfs/samples/ilm path has some samples as well. On 4/3/2020 18:25:33, Jim Kavitsky wrote: > Hello everyone, > I'm managing a low-multi-petabyte Scale filesystem with hundreds of millions of inodes, and I'm looking?for the best way to locate archivable directories. For example, these might be directories where whose contents were greater than 5 or 10TB, and whose contents had atimes greater than two years. > > Has anyone found a great way to do this with a policy engine run? If not, is there another good way that anyone would recommend? Thanks in advance, yes, there is another way, the 'mmfind' utility, also in the same sample path. You have to compile it for you OS (mmfind.README). This is a very powerful canned procedure that lets you run the "-exec" option just as in the normal linux version of 'find'. I use it very often, and it's just as efficient as the other policy rules based alternative. Good luck. Keep safe and confined. Jaime > > Jim Kavitsky > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > . . . ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 From alex at calicolabs.com Sat Apr 4 00:50:50 2020 From: alex at calicolabs.com (Alex Chekholko) Date: Fri, 3 Apr 2020 16:50:50 -0700 Subject: [gpfsug-discuss] fast search for archivable data sets In-Reply-To: References: Message-ID: Hi Jim, The common non-GPFS-specific way is to use a tool that dumps all of your filesystem metadata into an SQL database and then you can have a webapp that makes nice graphs/reports from the SQL database, or do your own queries. The Free Software example is "Robinhood" (use the POSIX scanner, not the lustre-specific one) and one proprietary example is Starfish. In both cases, you need a pretty beefy machine for the DB and the scanning of your filesystem may take a long time, depending on your filesystem performance. And then without any filesystem-specific hooks like a transaction log, you'll need to rescan the entire filesystem to update your db. Regards, Alex On Fri, Apr 3, 2020 at 3:25 PM Jim Kavitsky wrote: > Hello everyone, > I'm managing a low-multi-petabyte Scale filesystem with hundreds of > millions of inodes, and I'm looking for the best way to locate archivable > directories. For example, these might be directories where whose contents > were greater than 5 or 10TB, and whose contents had atimes greater than two > years. > > Has anyone found a great way to do this with a policy engine run? If not, > is there another good way that anyone would recommend? Thanks in advance, > > Jim Kavitsky > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cblack at nygenome.org Sat Apr 4 01:26:22 2020 From: cblack at nygenome.org (Christopher Black) Date: Sat, 4 Apr 2020 00:26:22 +0000 Subject: [gpfsug-discuss] fast search for archivable data sets In-Reply-To: References: Message-ID: As Alex mentioned, there are tools that will keep filesystem metadata in a database and provide query tools. NYGC uses Starfish and we?ve had good experience with it. At first the only feature we used is ?sfdu? which is a quick replacement for recursive du. Using this we can script csv reports for selections of dirs. As we use starfish more, we?ve started opening the web interface to people to look at selected areas of our filesystems where they can sort directories by size, mtime, atime, and run other reports and queries. We?ve also started using tagging functionality so we can quickly get an aggregate total (and growth over time) by tag across multiple directories. We tried Robinhood years ago but found it was taking too much work to get it to scale to 100s of millions of files and 10s of PiB on gpfs. It might be better now. IBM has a metadata product called Spectrum Discover that has the benefit of using gpfs-specific interfaces to be always up to date. Many of the other tools require scheduling scans to update the db. Igneous has a commercial tool called DataDiscover which also looked promising. ClarityNow and MediaFlux are other similar tools. I expect all of these tools at the very least have nice replacements for du and find as well as some sort of web directory tree view. We had run Starfish for a while and did a re-evaluation of a few options in 2019 and ultimately decided to stay with Starfish for now. Best, Chris From: on behalf of Alex Chekholko Reply-To: gpfsug main discussion list Date: Friday, April 3, 2020 at 7:51 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] fast search for archivable data sets Hi Jim, The common non-GPFS-specific way is to use a tool that dumps all of your filesystem metadata into an SQL database and then you can have a webapp that makes nice graphs/reports from the SQL database, or do your own queries. The Free Software example is "Robinhood" (use the POSIX scanner, not the lustre-specific one) and one proprietary example is Starfish. In both cases, you need a pretty beefy machine for the DB and the scanning of your filesystem may take a long time, depending on your filesystem performance. And then without any filesystem-specific hooks like a transaction log, you'll need to rescan the entire filesystem to update your db. Regards, Alex On Fri, Apr 3, 2020 at 3:25 PM Jim Kavitsky > wrote: Hello everyone, I'm managing a low-multi-petabyte Scale filesystem with hundreds of millions of inodes, and I'm looking for the best way to locate archivable directories. For example, these might be directories where whose contents were greater than 5 or 10TB, and whose contents had atimes greater than two years. Has anyone found a great way to do this with a policy engine run? If not, is there another good way that anyone would recommend? Thanks in advance, Jim Kavitsky _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From leslie.james.elliott at gmail.com Sat Apr 4 07:00:34 2020 From: leslie.james.elliott at gmail.com (leslie elliott) Date: Sat, 4 Apr 2020 16:00:34 +1000 Subject: [gpfsug-discuss] afmHashVersion Message-ID: I was wondering if there was any more information on the different values for afmHashVersion the default value is 2 but if we want to assign an afmGateway to a fileset we need a value of 5 is there likely to be any performance degradation because of this change do the home cluster and the cache cluster both have to be set to 5 for the fileset allocation to gateways just trying to find a little more information before we try this on a production system with a large number of afm independent filesets leslie -------------- next part -------------- An HTML attachment was scrubbed... URL: From spectrumscale at kiranghag.com Sat Apr 4 22:57:33 2020 From: spectrumscale at kiranghag.com (KG) Date: Sun, 5 Apr 2020 03:27:33 +0530 Subject: [gpfsug-discuss] io500 - mmfind - Pfind found 0 matches, something is wrong with the script. Message-ID: Hi Folks I am trying to setup IO500 test on a scale cluster and looking for more info on mmfind. I have compiled mmfindUtil_processOutputFile and updated the correct path in mmfind.sh. The runs however do not come up with any matches. Any pointers wrt something that I may have missed? TIA [Starting] mdtest_hard_write [Exec] mpirun -np 2 /tools/io-500-dev-master/bin/mdtest -C -t -F -P -w *3901 *-e 3901 -d /ibm/nasdata/datafiles/io500.2020.04.05-03.20.00/mdt_hard -n 950000 -x /ibm/nasdata/datafiles/io500.2020.04.05-03.20.00/mdt_hard-stonewall -a POSIX -N 1 -Y -W 5 [Results] in /ibm/nasdata/results/2020.04.05-03.20.00/mdtest_hard_write.txt. [Warning] This cannot be an official IO-500 score. The phase runtime of 9.8918s is below 300s. [Warning] Suggest io500_mdtest_hard_files_per_proc=30732525 [RESULT-invalid] IOPS phase 2 mdtest_hard_write 0.225 kiops : time 8.99 seconds [Starting] find [Exec] mpirun -np 2 /tools/io-500-dev-master/bin/mmfind.sh /ibm/nasdata/datafiles/io500.2020.04.05-03.20.00 -newer /ibm/nasdata/datafiles/io500.2020.04.05-03.20.00/timestampfile -size *3901c *-name "*01*" [Results] in /ibm/nasdata/results/2020.04.05-03.20.00/find.txt. *[Warning] Pfind found 0 matches, something is wrong with the script.* [FIND] *MATCHED 0/3192* in 12.0671 seconds -------------- next part -------------- An HTML attachment was scrubbed... URL: From vpuvvada at in.ibm.com Mon Apr 6 10:16:49 2020 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Mon, 6 Apr 2020 14:46:49 +0530 Subject: [gpfsug-discuss] afmHashVersion In-Reply-To: References: Message-ID: afmHashVersion=5 does not cause any performance degradation, this hash version allows assigning a gateway for the fileset using mmchfileset command. This option is not required for AFM home cluster(assuming that home is not a cache for other home). It is needed only at the AFM cache cluster and at client cluster if it remote mounts the AFM cache cluster. For changing afmHashVersion=5, all the nodes in the AFM cache and client cluster have to be upgraded to the minimum 5.0.2 level. This option cannot be set dynamically using -i/-I option, all the nodes in the both AFM cache and client clusters have to be shutdown to set this option. It is recommended to use 5.0.4-3 or later. https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.2/com.ibm.spectrum.scale.v5r02.doc/bl1adm_mmchconfig.htm ~Venkat (vpuvvada at in.ibm.com) From: leslie elliott To: gpfsug main discussion list Date: 04/04/2020 11:30 AM Subject: [EXTERNAL] [gpfsug-discuss] afmHashVersion Sent by: gpfsug-discuss-bounces at spectrumscale.org I was wondering if there was any more information on the different values for afmHashVersion the default value is 2 but if we want to assign an afmGateway to a fileset we need a value of 5 is there likely to be any performance degradation because of this change do the home cluster and the cache cluster both have to be set to 5 for the fileset allocation to gateways just trying to find a little more information before we try this on a production system with a large number of afm independent filesets leslie _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=Y5qSHFJ-z_7fbgD3YvcDG0SCsJbJ5rvNPBI5y5eF6Ec&s=b7XaEKNTas9WQ9qZNBSOW2XDvQNzUMTgdcAb7lQ4170&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From marc.caubet at psi.ch Mon Apr 6 12:20:59 2020 From: marc.caubet at psi.ch (Caubet Serrabou Marc (PSI)) Date: Mon, 6 Apr 2020 11:20:59 +0000 Subject: [gpfsug-discuss] "csm_resync_needed" after upgrading to GPFS v5.0.4-2 Message-ID: Hi all, after upgrading one of the clusters to GPFS v5.0.4-2 and setting "minReleaseLevel 5.0.4.0" I started to see random "csm_resync_needed" errors on some nodes. This can be easily cleared with "mmhealth node show --resync", however after some minutes the error re-appears. Apparently, no errors in the log files and no apparent problems other than the "csm_resync_needed" error. Before opening a support case, any hints about what could be the reason of that and whether I should worry about it? I would like to clarify what's going on before upgrading the main cluster. Thanks a lot, Marc _________________________________________________________ Paul Scherrer Institut High Performance Computing & Emerging Technologies Marc Caubet Serrabou Building/Room: OHSA/014 Forschungsstrasse, 111 5232 Villigen PSI Switzerland Telephone: +41 56 310 46 67 E-Mail: marc.caubet at psi.ch -------------- next part -------------- An HTML attachment was scrubbed... URL: From NSCHULD at de.ibm.com Mon Apr 6 13:25:22 2020 From: NSCHULD at de.ibm.com (Norbert Schuld) Date: Mon, 6 Apr 2020 14:25:22 +0200 Subject: [gpfsug-discuss] =?utf-8?q?=22csm=5Fresync=5Fneeded=22_after_upgr?= =?utf-8?q?ading_to_GPFS=09v5=2E0=2E4-2?= In-Reply-To: References: Message-ID: Hi, are the nodes running on AIX? If so my advice would be to change /var/mmfs/mmsysmon/mmsysmonitor.conf to read [InterNodeEventing] usesharedlib = 0 and the do a "mmsysmoncontrol restart". What was the min. release level before the upgrade? For most other cases a "mmsysmoncontrol restart" on the affected nodes + cluster manager node should do. Mit freundlichen Gr??en / Kind regards Norbert Schuld From: "Caubet Serrabou Marc (PSI)" To: "gpfsug-discuss at spectrumscale.org" Date: 06.04.2020 13:36 Subject: [EXTERNAL] [gpfsug-discuss] "csm_resync_needed" after upgrading to GPFS v5.0.4-2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, after upgrading one of the clusters to GPFS v5.0.4-2 and setting "minReleaseLevel 5.0.4.0" I started to see random "csm_resync_needed" errors on some nodes. This can be easily cleared with "mmhealth node show --resync", however after some minutes the error re-appears. Apparently, no errors in the log files and no apparent problems other than the "csm_resync_needed" error. Before opening a support case, any hints about what could be the reason of that and whether I should worry about it? I would like to clarify what's going on before upgrading the main cluster. Thanks a lot, Marc _________________________________________________________ Paul Scherrer Institut High Performance Computing & Emerging Technologies Marc Caubet Serrabou Building/Room: OHSA/014 Forschungsstrasse, 111 5232 Villigen PSI Switzerland Telephone: +41 56 310 46 67 E-Mail: marc.caubet at psi.ch_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=i4V0h7L9ElftZNfcuPIXmAHN2jl5TLcuyFLqtinu4j8&m=gU-FoFUzF10SfzgJPcd51vPIxjhkE6puV5hxAyPIA6I&s=zdEGNkM_ZSiem6wnOFZFVpTGjvSPG4wlFUFIhDVqcWM&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From marc.caubet at psi.ch Mon Apr 6 13:54:43 2020 From: marc.caubet at psi.ch (Caubet Serrabou Marc (PSI)) Date: Mon, 6 Apr 2020 12:54:43 +0000 Subject: [gpfsug-discuss] "csm_resync_needed" after upgrading to GPFS v5.0.4-2 In-Reply-To: References: , Message-ID: <46571b6503544f329029b2520c70152e@psi.ch> Hi Norbert, thanks a lot for for answering. The nodes are running RHEL7.7 (Kernel 3.10.0-1062.12.1.el7.x86_64). The previous version was 5.0.3-2. I restarted mmsysmoncontrol (I kept usesharedlib=1 as this is RHEL). Restarting it, it cleans mmhealth messages as expected, let's see whether this is repeated or not but it might take several minutes. Just add that when I had a mix of 5.0.3-2 and 5.0.4-2 I received some 'stale_mount' messages (from GPFSGUI) for a remote cluster filesystem mountpoints, but apparently everything worked fine. After upgrading everything to v5.0.4-2 looks like the same nodes report the 'csm_resync_needed' instead (no more 'stale_mount' errors seen since then). I am not sure whether this is related or not but might be a hint if this is related. Best regards, Marc _________________________________________________________ Paul Scherrer Institut High Performance Computing & Emerging Technologies Marc Caubet Serrabou Building/Room: OHSA/014 Forschungsstrasse, 111 5232 Villigen PSI Switzerland Telephone: +41 56 310 46 67 E-Mail: marc.caubet at psi.ch ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Norbert Schuld Sent: Monday, April 6, 2020 2:25:22 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] "csm_resync_needed" after upgrading to GPFS v5.0.4-2 Hi, are the nodes running on AIX? If so my advice would be to change /var/mmfs/mmsysmon/mmsysmonitor.conf to read [InterNodeEventing] usesharedlib = 0 and the do a "mmsysmoncontrol restart". What was the min. release level before the upgrade? For most other cases a "mmsysmoncontrol restart" on the affected nodes + cluster manager node should do. Mit freundlichen Gr??en / Kind regards Norbert Schuld [Inactive hide details for "Caubet Serrabou Marc (PSI)" ---06.04.2020 13:36:28---Hi all, after upgrading one of the clusters to]"Caubet Serrabou Marc (PSI)" ---06.04.2020 13:36:28---Hi all, after upgrading one of the clusters to GPFS v5.0.4-2 and setting "minReleaseLevel 5.0.4.0" I From: "Caubet Serrabou Marc (PSI)" To: "gpfsug-discuss at spectrumscale.org" Date: 06.04.2020 13:36 Subject: [EXTERNAL] [gpfsug-discuss] "csm_resync_needed" after upgrading to GPFS v5.0.4-2 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi all, after upgrading one of the clusters to GPFS v5.0.4-2 and setting "minReleaseLevel 5.0.4.0" I started to see random "csm_resync_needed" errors on some nodes. This can be easily cleared with "mmhealth node show --resync", however after some minutes the error re-appears. Apparently, no errors in the log files and no apparent problems other than the "csm_resync_needed" error. Before opening a support case, any hints about what could be the reason of that and whether I should worry about it? I would like to clarify what's going on before upgrading the main cluster. Thanks a lot, Marc _________________________________________________________ Paul Scherrer Institut High Performance Computing & Emerging Technologies Marc Caubet Serrabou Building/Room: OHSA/014 Forschungsstrasse, 111 5232 Villigen PSI Switzerland Telephone: +41 56 310 46 67 E-Mail: marc.caubet at psi.ch_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: graycol.gif URL: From lists at esquad.de Mon Apr 6 13:50:28 2020 From: lists at esquad.de (Dieter Mosbach) Date: Mon, 6 Apr 2020 14:50:28 +0200 Subject: [gpfsug-discuss] "csm_resync_needed" after upgrading to GPFS v5.0.4-2 In-Reply-To: References: Message-ID: <04d1ba1d-e3d0-41ab-85c7-c6d6cabfd0d4@esquad.de> Am 06.04.2020 um 13:20 schrieb Caubet Serrabou Marc (PSI): > Hi all, > > > after upgrading one of the clusters to GPFS v5.0.4-2 and setting "minReleaseLevel 5.0.4.0" I started to see random "csm_resync_needed" errors on some nodes. This can be easily cleared with "mmhealth node show --resync", however after some minutes the error re-appears. > > Apparently, no errors in the log files and no apparent problems other than the "csm_resync_needed" error. > > > Before opening a support case, any hints about what could be the reason of that and whether I should worry about it? I would like to clarify what's going on before upgrading the main cluster. This seems to be a bug in v5, open a support case. We had to check: mmdsh -N [AIXNode1,AIXNode2,AIXNode3] "grep usesharedlib /var/mmfs/mmsysmon/mmsysmonitor.conf" and to change: mmdsh -N [AIXNode1,AIXNode2,AIXNode3] "sed -i 's/usesharedlib = 1/usesharedlib = 0/g' /var/mmfs/mmsysmon/mmsysmonitor.conf" mmdsh -N [AIXNode1,AIXNode2,AIXNode3] "mmsysmoncontrol restart" Regards, Dieter From marc.caubet at psi.ch Tue Apr 7 07:38:42 2020 From: marc.caubet at psi.ch (Caubet Serrabou Marc (PSI)) Date: Tue, 7 Apr 2020 06:38:42 +0000 Subject: [gpfsug-discuss] "csm_resync_needed" after upgrading to GPFS v5.0.4-2 In-Reply-To: <04d1ba1d-e3d0-41ab-85c7-c6d6cabfd0d4@esquad.de> References: , <04d1ba1d-e3d0-41ab-85c7-c6d6cabfd0d4@esquad.de> Message-ID: <66cfa1b3942d45489c611d72e5b39d42@psi.ch> Hi, just for the record, after restarting mmsysmoncontrol on all nodes looks like the errors disappeared and no longer appear (and it has been running for several hours already). No need to change usesharedlib, which I have it enabled (1) for RHEL systems. Thanks a lot for your help, Marc _________________________________________________________ Paul Scherrer Institut High Performance Computing & Emerging Technologies Marc Caubet Serrabou Building/Room: OHSA/014 Forschungsstrasse, 111 5232 Villigen PSI Switzerland Telephone: +41 56 310 46 67 E-Mail: marc.caubet at psi.ch ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Dieter Mosbach Sent: Monday, April 6, 2020 2:50:28 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] "csm_resync_needed" after upgrading to GPFS v5.0.4-2 Am 06.04.2020 um 13:20 schrieb Caubet Serrabou Marc (PSI): > Hi all, > > > after upgrading one of the clusters to GPFS v5.0.4-2 and setting "minReleaseLevel 5.0.4.0" I started to see random "csm_resync_needed" errors on some nodes. This can be easily cleared with "mmhealth node show --resync", however after some minutes the error re-appears. > > Apparently, no errors in the log files and no apparent problems other than the "csm_resync_needed" error. > > > Before opening a support case, any hints about what could be the reason of that and whether I should worry about it? I would like to clarify what's going on before upgrading the main cluster. This seems to be a bug in v5, open a support case. We had to check: mmdsh -N [AIXNode1,AIXNode2,AIXNode3] "grep usesharedlib /var/mmfs/mmsysmon/mmsysmonitor.conf" and to change: mmdsh -N [AIXNode1,AIXNode2,AIXNode3] "sed -i 's/usesharedlib = 1/usesharedlib = 0/g' /var/mmfs/mmsysmon/mmsysmonitor.conf" mmdsh -N [AIXNode1,AIXNode2,AIXNode3] "mmsysmoncontrol restart" Regards, Dieter _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From kraemerf at de.ibm.com Tue Apr 14 08:42:12 2020 From: kraemerf at de.ibm.com (Frank Kraemer) Date: Tue, 14 Apr 2020 09:42:12 +0200 Subject: [gpfsug-discuss] *NEWS* IBM Spectrum Discover for Storage Metadata Management Video April 09, 2020 Message-ID: *FYI* IBM Spectrum Discover is a next-generation metadata management solution that delivers exceptional performance at exabyte scale, so organizations can harness value from massive amounts of unstructured data from heterogeneous file and object storage on premises and in the cloud to create competitive advantage in the areas of analytics and AI initiatives, governance, and storage optimization. Here are other videos in this series of related IBM Spectrum Discover topics that give you examples to get started: 1) IBM Spectrum Discover: Download, Deploy, and Configure https://youtu.be/FMOuzn__qRI 2) IBM Spectrum Discover: Scanning S3 data sources such as Amazon S3 or Ceph https://youtu.be/zaADfeTGwzY 3) IBM Spectrum Discover: Scanning IBM Spectrum Scale (GPFS) and IBM ESS data sources https://youtu.be/3mBQciR2tXE 4) IBM Spectrum Discover: Scanning an IBM Spectrum Protect data source https://youtu.be/wdXvnJ_GEQs 5) IBM Spectrum Discover: Insights into your files for better TCO with IBM Spectrum Archive EE https://youtu.be/_YNfFDdMEa4 Appendix: Here are additional online educational materials related to IBM Spectrum Discover solutions: IBM Spectrum Discover Knowledge Center: https://www.ibm.com/support/knowledgecenter/SSY8AC IBM Spectrum Discover Free 90 Day Trial: https://www.ibm.com/us-en/marketplace/spectrum-discover IBM Spectrum Discover: Metadata Management for Deep Insight of Unstructured Storage, REDP-5550: http://www.redbooks.ibm.com/abstracts/redp5550.html -frank- Frank Kraemer IBM Consulting IT Specialist / Client Technical Architect Am Weiher 24, 65451 Kelsterbach, Germany mailto:kraemerf at de.ibm.com Mobile +49171-3043699 IBM Germany -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Apr 14 11:15:41 2020 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 14 Apr 2020 10:15:41 +0000 Subject: [gpfsug-discuss] *NEWS* IBM Spectrum Discover for Storage Metadata Management Video April 09, 2020 In-Reply-To: References: Message-ID: <714908f022894851b52efa0944c80737@bham.ac.uk> Just a reminder that this is a Spectrum Scale technical forum and shouldn't be used for marketing nor advertising of other products. There are a number of vendors who have competing products who might also wish to post here. If you wish to discuss Discover at a technical level, there is a dedicated channel on the slack community for this. Thanks Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of kraemerf at de.ibm.com Sent: 14 April 2020 08:42 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] *NEWS* IBM Spectrum Discover for Storage Metadata Management Video April 09, 2020 *FYI* -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Wed Apr 15 16:29:53 2020 From: knop at us.ibm.com (Felipe Knop) Date: Wed, 15 Apr 2020 15:29:53 +0000 Subject: [gpfsug-discuss] Kernel crashes with Spectrum Scale and RHEL 7.7 3.10.0-1062.18.1.el7 kernel Message-ID: An HTML attachment was scrubbed... URL: From xhejtman at ics.muni.cz Wed Apr 15 16:36:48 2020 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Wed, 15 Apr 2020 17:36:48 +0200 Subject: [gpfsug-discuss] Kernel crashes with Spectrum Scale and RHEL 7.7 3.10.0-1062.18.1.el7 kernel In-Reply-To: References: Message-ID: <20200415153648.GK30439@ics.muni.cz> Hello, I noticed this bug, it took about 10 minutes to crash. However, I'm seeing similar NULL pointer dereference even with older kernels, That dereference does not happen always in GPFS code, sometimes outside in NFS or elsewhere, however it looks familiar. I have many crashdumps about this. On Wed, Apr 15, 2020 at 03:29:53PM +0000, Felipe Knop wrote: > All, > ? > A problem has been identified with Spectrum Scale when running on RHEL 7.7 > and kernel 3.10.0-1062.18.1.el7.? While a fix is being currently > developed, customers should not move up to this kernel level. > ? > The new kernel was issued on March 17 via the following errata:? > [1]https://access.redhat.com/errata/RHSA-2020:0834 > ? > When this kernel is used with Scale, system crashes have been observed. > The following are a couple of examples of kernel stack traces for the > crash: > ? > ? > [ 2915.625015] BUG: unable to handle kernel NULL pointer dereference at > 0000000000000040 > [ 2915.633770] IP: [] > cxiDropSambaDCacheEntry+0x190/0x1b0 [mmfslinux] > [ 2915.914097]? [] gpfs_i_rmdir+0x29c/0x310 [mmfslinux] > [ 2915.921381]? [] ? take_dentry_name_snapshot+0xf0/0xf0 > [ 2915.928760]? [] ? shrink_dcache_parent+0x60/0x90 > [ 2915.935656]? [] vfs_rmdir+0xdc/0x150 > [ 2915.941388]? [] do_rmdir+0x1f1/0x220 > [ 2915.947119]? [] ? __fput+0x186/0x260 > [ 2915.952849]? [] ? ____fput+0xe/0x10 > [ 2915.958484]? [] ? task_work_run+0xc0/0xe0 > [ 2915.964701]? [] SyS_unlinkat+0x25/0x40 > ? > [1224278.495993] [] __dentry_kill+0x128/0x190 > [1224278.496678] [] dput+0xb6/0x1a0 > [1224278.497378] [] d_prune_aliases+0xb6/0xf0 > [1224278.498083] [] cxiPruneDCacheEntry+0x13a/0x1c0 > [mmfslinux] > [1224278.498798] [] > _ZN10gpfsNode_t16invalidateOSNodeEPS_Pvij+0x108/0x350 [mmfs26] > ? > ? > RHEL 7.8 is also impacted by the same problem, but validation of Scale > with 7.8 is still under way. > ? > ? > ? Felipe > ? > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > ? > > References > > Visible links > 1. https://access.redhat.com/errata/RHSA-2020:0834 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title From laurence.schuler at nasa.gov Wed Apr 15 16:49:59 2020 From: laurence.schuler at nasa.gov (Schuler, Laurence (GSFC-606.4)[ADNET SYSTEMS INC]) Date: Wed, 15 Apr 2020 15:49:59 +0000 Subject: [gpfsug-discuss] [EXTERNAL] Kernel crashes with Spectrum Scale and RHEL 7.7 3.10.0-1062.18.1.el7 kernel In-Reply-To: References: Message-ID: <8A1DF819-E3B3-4727-8CA6-F26C5BFC09B4@nasa.gov> Will this impact *any* version of Spectrum Scale? -Laurence From: on behalf of Felipe Knop Reply-To: gpfsug main discussion list Date: Wednesday, April 15, 2020 at 11:30 AM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] [gpfsug-discuss] Kernel crashes with Spectrum Scale and RHEL 7.7 3.10.0-1062.18.1.el7 kernel All, A problem has been identified with Spectrum Scale when running on RHEL 7.7 and kernel 3.10.0-1062.18.1.el7. While a fix is being currently developed, customers should not move up to this kernel level. The new kernel was issued on March 17 via the following errata: https://access.redhat.com/errata/RHSA-2020:0834 When this kernel is used with Scale, system crashes have been observed. The following are a couple of examples of kernel stack traces for the crash: [ 2915.625015] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040 [ 2915.633770] IP: [] cxiDropSambaDCacheEntry+0x190/0x1b0 [mmfslinux] [ 2915.914097] [] gpfs_i_rmdir+0x29c/0x310 [mmfslinux] [ 2915.921381] [] ? take_dentry_name_snapshot+0xf0/0xf0 [ 2915.928760] [] ? shrink_dcache_parent+0x60/0x90 [ 2915.935656] [] vfs_rmdir+0xdc/0x150 [ 2915.941388] [] do_rmdir+0x1f1/0x220 [ 2915.947119] [] ? __fput+0x186/0x260 [ 2915.952849] [] ? ____fput+0xe/0x10 [ 2915.958484] [] ? task_work_run+0xc0/0xe0 [ 2915.964701] [] SyS_unlinkat+0x25/0x40 [1224278.495993] [] __dentry_kill+0x128/0x190 [1224278.496678] [] dput+0xb6/0x1a0 [1224278.497378] [] d_prune_aliases+0xb6/0xf0 [1224278.498083] [] cxiPruneDCacheEntry+0x13a/0x1c0 [mmfslinux] [1224278.498798] [] _ZN10gpfsNode_t16invalidateOSNodeEPS_Pvij+0x108/0x350 [mmfs26] RHEL 7.8 is also impacted by the same problem, but validation of Scale with 7.8 is still under way. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 9466 bytes Desc: not available URL: From knop at us.ibm.com Wed Apr 15 17:25:41 2020 From: knop at us.ibm.com (Felipe Knop) Date: Wed, 15 Apr 2020 16:25:41 +0000 Subject: [gpfsug-discuss] Kernel crashes with Spectrum Scale and RHEL 7.7 3.10.0-1062.18.1.el7 kernel In-Reply-To: <8A1DF819-E3B3-4727-8CA6-F26C5BFC09B4@nasa.gov> References: <8A1DF819-E3B3-4727-8CA6-F26C5BFC09B4@nasa.gov>, Message-ID: An HTML attachment was scrubbed... URL: From xhejtman at ics.muni.cz Wed Apr 15 17:35:12 2020 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Wed, 15 Apr 2020 18:35:12 +0200 Subject: [gpfsug-discuss] Kernel crashes with Spectrum Scale and RHEL 7.7 3.10.0-1062.18.1.el7 kernel In-Reply-To: References: <8A1DF819-E3B3-4727-8CA6-F26C5BFC09B4@nasa.gov> Message-ID: <20200415163512.GP30439@ics.muni.cz> And are you sure it is present only in -1062.18.1.el7 kernel? I think it is present in all -1062.* kernels.. On Wed, Apr 15, 2020 at 04:25:41PM +0000, Felipe Knop wrote: > Laurence, > ? > The problem affects all the Scale releases / PTFs. > ? > ? Felipe > ? > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > ? > ? > ? > > ----- Original message ----- > From: "Schuler, Laurence (GSFC-606.4)[ADNET SYSTEMS INC]" > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: Re: [gpfsug-discuss] [EXTERNAL] Kernel crashes with Spectrum > Scale and RHEL 7.7 3.10.0-1062.18.1.el7 kernel > Date: Wed, Apr 15, 2020 12:10 PM > ? > > Will this impact *any* version of Spectrum Scale? > > ? > > -Laurence > > ? > > From: on behalf of Felipe > Knop > Reply-To: gpfsug main discussion list > Date: Wednesday, April 15, 2020 at 11:30 AM > To: "gpfsug-discuss at spectrumscale.org" > > Subject: [EXTERNAL] [gpfsug-discuss] Kernel crashes with Spectrum Scale > and RHEL 7.7 3.10.0-1062.18.1.el7 kernel > > ? > > All, > > ? > > A problem has been identified with Spectrum Scale when running on RHEL > 7.7 and kernel 3.10.0-1062.18.1.el7.? While a fix is being currently > developed, customers should not move up to this kernel level. > > ? > > The new kernel was issued on March 17 via the following errata:? > [1]https://access.redhat.com/errata/RHSA-2020:0834 > > ? > > When this kernel is used with Scale, system crashes have been observed. > The following are a couple of examples of kernel stack traces for the > crash: > > ? > > ? > > [ 2915.625015] BUG: unable to handle kernel NULL pointer dereference at > 0000000000000040 > [ 2915.633770] IP: [] > cxiDropSambaDCacheEntry+0x190/0x1b0 [mmfslinux] > > [ 2915.914097]? [] gpfs_i_rmdir+0x29c/0x310 > [mmfslinux] > [ 2915.921381]? [] ? > take_dentry_name_snapshot+0xf0/0xf0 > [ 2915.928760]? [] ? shrink_dcache_parent+0x60/0x90 > [ 2915.935656]? [] vfs_rmdir+0xdc/0x150 > [ 2915.941388]? [] do_rmdir+0x1f1/0x220 > [ 2915.947119]? [] ? __fput+0x186/0x260 > [ 2915.952849]? [] ? ____fput+0xe/0x10 > [ 2915.958484]? [] ? task_work_run+0xc0/0xe0 > [ 2915.964701]? [] SyS_unlinkat+0x25/0x40 > > ? > > [1224278.495993] [] __dentry_kill+0x128/0x190 > [1224278.496678] [] dput+0xb6/0x1a0 > [1224278.497378] [] d_prune_aliases+0xb6/0xf0 > [1224278.498083] [] cxiPruneDCacheEntry+0x13a/0x1c0 > [mmfslinux] > [1224278.498798] [] > _ZN10gpfsNode_t16invalidateOSNodeEPS_Pvij+0x108/0x350 [mmfs26] > > ? > > ? > > RHEL 7.8 is also impacted by the same problem, but validation of Scale > with 7.8 is still under way. > > ? > > ? > > ? Felipe > > ? > > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > ? > > ? > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > [2]http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ? > > References > > Visible links > 1. https://access.redhat.com/errata/RHSA-2020:0834 > 2. http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title From knop at us.ibm.com Wed Apr 15 17:51:02 2020 From: knop at us.ibm.com (Felipe Knop) Date: Wed, 15 Apr 2020 16:51:02 +0000 Subject: [gpfsug-discuss] Kernel crashes with Spectrum Scale and RHEL 7.7 3.10.0-1062.18.1.el7 kernel In-Reply-To: <20200415163512.GP30439@ics.muni.cz> References: <20200415163512.GP30439@ics.muni.cz>, <8A1DF819-E3B3-4727-8CA6-F26C5BFC09B4@nasa.gov> Message-ID: An HTML attachment was scrubbed... URL: From xhejtman at ics.muni.cz Wed Apr 15 18:06:57 2020 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Wed, 15 Apr 2020 19:06:57 +0200 Subject: [gpfsug-discuss] Kernel crashes with Spectrum Scale and RHEL 7.7 3.10.0-1062.18.1.el7 kernel In-Reply-To: References: <20200415163512.GP30439@ics.muni.cz> <8A1DF819-E3B3-4727-8CA6-F26C5BFC09B4@nasa.gov> Message-ID: <20200415170657.GQ30439@ics.muni.cz> Should I report then or just wait to fix 18.1 problem and see whether older ones are gone as well? On Wed, Apr 15, 2020 at 04:51:02PM +0000, Felipe Knop wrote: > Lukas, > ? > There was one particular kernel change introduced in 3.10.0-1062.18.1 that > has triggered a given set of crashes. It's possible, though, that there is > a lingering problem affecting older levels of 3.10.0-1062. I believe that > crashes occurring on older kernels should be treated as separate problems. > ? > ? Felipe > ? > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > ? > ? > ? > > ----- Original message ----- > From: Lukas Hejtmanek > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: [EXTERNAL] Re: [gpfsug-discuss] Kernel crashes with Spectrum > Scale and RHEL 7.7 3.10.0-1062.18.1.el7 kernel > Date: Wed, Apr 15, 2020 12:35 PM > ? > And are you sure it is present only in -1062.18.1.el7 kernel? I think it > is > present in all -1062.* kernels.. > > On Wed, Apr 15, 2020 at 04:25:41PM +0000, Felipe Knop wrote: > > ? ?Laurence, > > ? ?? > > ? ?The problem affects all the Scale releases / PTFs. > > ? ?? > > ? ?? Felipe > > ? ?? > > ? ?---- > > ? ?Felipe Knop knop at us.ibm.com > > ? ?GPFS Development and Security > > ? ?IBM Systems > > ? ?IBM Building 008 > > ? ?2455 South Rd, Poughkeepsie, NY 12601 > > ? ?(845) 433-9314 T/L 293-9314 > > ? ?? > > ? ?? > > ? ?? > > > > ? ? ?----- Original message ----- > > ? ? ?From: "Schuler, Laurence (GSFC-606.4)[ADNET SYSTEMS INC]" > > ? ? ? > > ? ? ?Sent by: gpfsug-discuss-bounces at spectrumscale.org > > ? ? ?To: gpfsug main discussion list > > > ? ? ?Cc: > > ? ? ?Subject: Re: [gpfsug-discuss] [EXTERNAL] Kernel crashes with > Spectrum > > ? ? ?Scale and RHEL 7.7 3.10.0-1062.18.1.el7 kernel > > ? ? ?Date: Wed, Apr 15, 2020 12:10 PM > > ? ? ?? > > > > ? ? ?Will this impact *any* version of Spectrum Scale? > > > > ? ? ?? > > > > ? ? ?-Laurence > > > > ? ? ?? > > > > ? ? ?From: on behalf of > Felipe > > ? ? ?Knop > > ? ? ?Reply-To: gpfsug main discussion list > > > ? ? ?Date: Wednesday, April 15, 2020 at 11:30 AM > > ? ? ?To: "gpfsug-discuss at spectrumscale.org" > > ? ? ? > > ? ? ?Subject: [EXTERNAL] [gpfsug-discuss] Kernel crashes with Spectrum > Scale > > ? ? ?and RHEL 7.7 3.10.0-1062.18.1.el7 kernel > > > > ? ? ?? > > > > ? ? ?All, > > > > ? ? ?? > > > > ? ? ?A problem has been identified with Spectrum Scale when running on > RHEL > > ? ? ?7.7 and kernel 3.10.0-1062.18.1.el7.? While a fix is being > currently > > ? ? ?developed, customers should not move up to this kernel level. > > > > ? ? ?? > > > > ? ? ?The new kernel was issued on March 17 via the following errata:? > > ? ? ?[1][1]https://access.redhat.com/errata/RHSA-2020:0834? > > > > ? ? ?? > > > > ? ? ?When this kernel is used with Scale, system crashes have been > observed. > > ? ? ?The following are a couple of examples of kernel stack traces for > the > > ? ? ?crash: > > > > ? ? ?? > > > > ? ? ?? > > > > ? ? ?[ 2915.625015] BUG: unable to handle kernel NULL pointer > dereference at > > ? ? ?0000000000000040 > > ? ? ?[ 2915.633770] IP: [] > > ? ? ?cxiDropSambaDCacheEntry+0x190/0x1b0 [mmfslinux] > > > > ? ? ?[ 2915.914097]? [] gpfs_i_rmdir+0x29c/0x310 > > ? ? ?[mmfslinux] > > ? ? ?[ 2915.921381]? [] ? > > ? ? ?take_dentry_name_snapshot+0xf0/0xf0 > > ? ? ?[ 2915.928760]? [] ? > shrink_dcache_parent+0x60/0x90 > > ? ? ?[ 2915.935656]? [] vfs_rmdir+0xdc/0x150 > > ? ? ?[ 2915.941388]? [] do_rmdir+0x1f1/0x220 > > ? ? ?[ 2915.947119]? [] ? __fput+0x186/0x260 > > ? ? ?[ 2915.952849]? [] ? ____fput+0xe/0x10 > > ? ? ?[ 2915.958484]? [] ? task_work_run+0xc0/0xe0 > > ? ? ?[ 2915.964701]? [] SyS_unlinkat+0x25/0x40 > > > > ? ? ?? > > > > ? ? ?[1224278.495993] [] __dentry_kill+0x128/0x190 > > ? ? ?[1224278.496678] [] dput+0xb6/0x1a0 > > ? ? ?[1224278.497378] [] d_prune_aliases+0xb6/0xf0 > > ? ? ?[1224278.498083] [] > cxiPruneDCacheEntry+0x13a/0x1c0 > > ? ? ?[mmfslinux] > > ? ? ?[1224278.498798] [] > > ? ? ?_ZN10gpfsNode_t16invalidateOSNodeEPS_Pvij+0x108/0x350 [mmfs26] > > > > ? ? ?? > > > > ? ? ?? > > > > ? ? ?RHEL 7.8 is also impacted by the same problem, but validation of > Scale > > ? ? ?with 7.8 is still under way. > > > > ? ? ?? > > > > ? ? ?? > > > > ? ? ?? Felipe > > > > ? ? ?? > > > > ? ? ?---- > > ? ? ?Felipe Knop knop at us.ibm.com > > ? ? ?GPFS Development and Security > > ? ? ?IBM Systems > > ? ? ?IBM Building 008 > > ? ? ?2455 South Rd, Poughkeepsie, NY 12601 > > ? ? ?(845) 433-9314 T/L 293-9314 > > ? ? ?? > > > > ? ? ?? > > ? ? ?_______________________________________________ > > ? ? ?gpfsug-discuss mailing list > > ? ? ?gpfsug-discuss at spectrumscale.org > > ? ? ?[2][2]http://gpfsug.org/mailman/listinfo/gpfsug-discuss? > > > > ? ?? > > > > References > > > > ? ?Visible links > > ? ?1. [3]https://access.redhat.com/errata/RHSA-2020:0834? > > ? ?2. [4]http://gpfsug.org/mailman/listinfo/gpfsug-discuss? > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > [5]http://gpfsug.org/mailman/listinfo/gpfsug-discuss? > > -- > Luk?? Hejtm?nek > > Linux Administrator only because > ??Full Time Multitasking Ninja > ??is not an official job title > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > [6]http://gpfsug.org/mailman/listinfo/gpfsug-discuss? > ? > > ? > > References > > Visible links > 1. https://access.redhat.com/errata/RHSA-2020:0834 > 2. http://gpfsug.org/mailman/listinfo/gpfsug-discuss > 3. https://access.redhat.com/errata/RHSA-2020:0834 > 4. http://gpfsug.org/mailman/listinfo/gpfsug-discuss > 5. http://gpfsug.org/mailman/listinfo/gpfsug-discuss > 6. http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title From knop at us.ibm.com Wed Apr 15 19:17:15 2020 From: knop at us.ibm.com (Felipe Knop) Date: Wed, 15 Apr 2020 18:17:15 +0000 Subject: [gpfsug-discuss] Kernel crashes with Spectrum Scale and RHEL 7.7 3.10.0-1062.18.1.el7 kernel In-Reply-To: <20200415170657.GQ30439@ics.muni.cz> References: <20200415170657.GQ30439@ics.muni.cz>, <20200415163512.GP30439@ics.muni.cz><8A1DF819-E3B3-4727-8CA6-F26C5BFC09B4@nasa.gov> Message-ID: An HTML attachment was scrubbed... URL: From dean.flanders at fmi.ch Thu Apr 16 04:26:36 2020 From: dean.flanders at fmi.ch (Flanders, Dean) Date: Thu, 16 Apr 2020 03:26:36 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing Message-ID: Hello All, As IBM has completely switched to capacity based licensing in order to use SS v5 I was wondering how others are dealing with this? We do not find the capacity based licensing sustainable. Our long term plan is to migrate away from SS v5 to Lustre, and based on the Lustre roadmap we have seen it should have the features we need within the next ~1 year (we are fortunate to have good contacts). We would really like to stay with SS/GPFS and have been big advocates of SS/GPFS over the years, but the capacity based licensing is pushing us into evaluating alternatives. I realize this may not be proper to discuss this directly in this email list, so feel free to email directly with your suggestions or your plans. Thanks and kind regards, Dean -------------- next part -------------- An HTML attachment was scrubbed... URL: From heinrich.billich at id.ethz.ch Thu Apr 16 09:16:59 2020 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Thu, 16 Apr 2020 08:16:59 +0000 Subject: [gpfsug-discuss] Mmhealth events longwaiters_found and deadlock_detected Message-ID: <04D32874-9ABD-4591-8C2B-19D596789ED5@id.ethz.ch> Hello, I?m puzzled about the difference between the two mmhealth events longwaiters_found ERROR Detected Spectrum Scale long-waiters and deadlock_detected WARNING The cluster detected a Spectrum Scale filesystem deadlock Especially why the later has level WARNING only while the first has level ERROR? Longwaiters_found is based on the output of ?mmdiag ?deadlock? and occurs much more often on our clusters, while the later probably is triggered by an external event and no internal mmsysmon check? Deadlock detection is handled by mmfsd? Whenever a deadlock is detected some debug data is collected, which is not true for longwaiters_detected. Hm, so why is no deadlock detected whenever mmdiag ?deadlock shows waiting threads? Shouldn?t the severity be the opposite way? Finally: Can we trigger some debug data collection whenever a longwaiters_found event happens ? just getting the output of ?mmdiag ?deadlock? on the single node could give some hints. Without I don?t see any real chance to take any action. Thank you, Heiner -- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From Anna.Greim at de.ibm.com Thu Apr 16 11:55:56 2020 From: Anna.Greim at de.ibm.com (Anna Greim) Date: Thu, 16 Apr 2020 12:55:56 +0200 Subject: [gpfsug-discuss] =?utf-8?q?Mmhealth_events_longwaiters=5Ffound_an?= =?utf-8?q?d=09deadlock=5Fdetected?= In-Reply-To: <04D32874-9ABD-4591-8C2B-19D596789ED5@id.ethz.ch> References: <04D32874-9ABD-4591-8C2B-19D596789ED5@id.ethz.ch> Message-ID: Hi Heiner, I'm not really able to give you insights into the decision of the events' states. Maybe somebody else is able to answer here. But about your triggering debug data collection question, please have a look at this documentation page: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1adv_createscriptforevents.htm This feature is in the product since the 5.0.x versions and should be helpful here. It will trigger your eventsCallback script when the event is raised. One of the script's arguments is the event name. So it is possible to create a script, that checks for the event name longwaiters_found and then triggers a mmdiag --deadlock and write it into a txt file. The script call has a hard time out of 60 seconds so it does not interfere too much with the mmsysmon internals, but better would be a run time less than 1 second. Mit freundlichen Gr??en / Kind regards Anna Greim Software Engineer, Spectrum Scale Development IBM Systems IBM Data Privacy Statement IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Gregor Pillen Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "Billich Heinrich Rainer (ID SD)" To: gpfsug main discussion list Date: 16/04/2020 10:36 Subject: [EXTERNAL] [gpfsug-discuss] Mmhealth events longwaiters_found and deadlock_detected Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello, I?m puzzled about the difference between the two mmhealth events longwaiters_found ERROR Detected Spectrum Scale long-waiters and deadlock_detected WARNING The cluster detected a Spectrum Scale filesystem deadlock Especially why the later has level WARNING only while the first has level ERROR? Longwaiters_found is based on the output of ?mmdiag ?deadlock? and occurs much more often on our clusters, while the later probably is triggered by an external event and no internal mmsysmon check? Deadlock detection is handled by mmfsd? Whenever a deadlock is detected some debug data is collected, which is not true for longwaiters_detected. Hm, so why is no deadlock detected whenever mmdiag ?deadlock shows waiting threads? Shouldn?t the severity be the opposite way? Finally: Can we trigger some debug data collection whenever a longwaiters_found event happens ? just getting the output of ?mmdiag ?deadlock? on the single node could give some hints. Without I don?t see any real chance to take any action. Thank you, Heiner -- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=XLDdnBDnIn497KhM7_npStR6ig1r198VHeSBY1WbuHc&m=QAa_5ZRNpy310ikXZzwunhWU4TGKsH_NWDoYwh57MNo&s=dKWX1clbfClbfJb5yKSzhoNC1aqCbT6-7s1DQdx8CzY&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From carlz at us.ibm.com Thu Apr 16 13:44:14 2020 From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com) Date: Thu, 16 Apr 2020 12:44:14 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction Message-ID: Folks, I need to correct a common misunderstanding that is perpetuated here: > As IBM has completely switched to capacity based licensing in order to use SS v5 For new customers, Scale is priced Per TB (we also have Per PB licenses now for convenience). This transition was completed in January 2019. And for ESS, it is licensed Per Drive with different prices for HDDs and SSDs. Existing customers with Standard sockets can remain on and continue to buy more Standard sockets. There is no plan to end that entitlement. The same applies to customers with Advanced sockets who want to continue with Advanced. In both cases you can upgrade from V4.2 to V5.0 without changing your license metric. This licensing change is not connected to the migration from V4 to V5. However, I do see a lot of confusion around this point, including from my IBM colleagues, possibly because both transitions occurred around roughly the same time period. Regards, Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com ? From dean.flanders at fmi.ch Thu Apr 16 14:00:49 2020 From: dean.flanders at fmi.ch (Flanders, Dean) Date: Thu, 16 Apr 2020 13:00:49 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction In-Reply-To: References: Message-ID: Hello Carl, Yes, for existing IBM direct customers that may have been the case for v4 to v5. However, from my understanding if a customer bought GPFS/SS via DDN, Lenovo, etc. with embedded systems licenses, this is not the case. From my understanding existing customers from DDN, Lenovo, etc. that have v4 with socket based licenses are not entitled v5 licenses socket licenses. Is that a correct understanding? Thanks and kind regards, Dean -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Carl Zetie - carlz at us.ibm.com Sent: Thursday, April 16, 2020 2:44 PM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Spectrum Scale licensing - important correction Folks, I need to correct a common misunderstanding that is perpetuated here: > As IBM has completely switched to capacity based licensing in order to > use SS v5 For new customers, Scale is priced Per TB (we also have Per PB licenses now for convenience). This transition was completed in January 2019. And for ESS, it is licensed Per Drive with different prices for HDDs and SSDs. Existing customers with Standard sockets can remain on and continue to buy more Standard sockets. There is no plan to end that entitlement. The same applies to customers with Advanced sockets who want to continue with Advanced. In both cases you can upgrade from V4.2 to V5.0 without changing your license metric. This licensing change is not connected to the migration from V4 to V5. However, I do see a lot of confusion around this point, including from my IBM colleagues, possibly because both transitions occurred around roughly the same time period. Regards, Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com ? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From eric.wonderley at vt.edu Thu Apr 16 17:32:29 2020 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Thu, 16 Apr 2020 12:32:29 -0400 Subject: [gpfsug-discuss] gpfs filesets question Message-ID: I have filesets setup in a filesystem...looks like: [root at cl005 ~]# mmlsfileset home -L Filesets in file system 'home': Name Id RootInode ParentId Created InodeSpace MaxInodes AllocInodes Comment root 0 3 -- Tue Jun 30 07:54:09 2015 0 402653184 320946176 root fileset hess 1 543733376 0 Tue Jun 13 14:56:13 2017 0 0 0 predictHPC 2 1171116 0 Thu Jan 5 15:16:56 2017 0 0 0 HYCCSIM 3 544258049 0 Wed Jun 14 10:00:41 2017 0 0 0 socialdet 4 544258050 0 Wed Jun 14 10:01:02 2017 0 0 0 arc 5 1171073 0 Thu Jan 5 15:07:09 2017 0 0 0 arcadm 6 1171074 0 Thu Jan 5 15:07:10 2017 0 0 0 I beleive these are dependent filesets. Dependent on the root fileset. Anyhow a user wants to move a large amount of data from one fileset to another. Would this be a metadata only operation? He has attempted to small amount of data and has noticed some thrasing. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Thu Apr 16 18:11:40 2020 From: stockf at us.ibm.com (Frederick Stock) Date: Thu, 16 Apr 2020 17:11:40 +0000 Subject: [gpfsug-discuss] gpfs filesets question In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From eric.wonderley at vt.edu Thu Apr 16 18:36:35 2020 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Thu, 16 Apr 2020 13:36:35 -0400 Subject: [gpfsug-discuss] gpfs filesets question In-Reply-To: References: Message-ID: Hi Fred: I do. I have 3 pools. system, ssd data pool(fc_ssd400G) and a spinning disk pool(fc_8T). I want to think the ssd_data_pool is empty at the moment and the system pool is ssd and only contains metadata. [root at cl005 ~]# mmdf home -P fc_ssd400G disk disk size failure holds holds free KB free KB name in KB group metadata data in full blocks in fragments --------------- ------------- -------- -------- ----- -------------------- ------------------- Disks in storage pool: fc_ssd400G (Maximum disk size allowed is 97 TB) r10f1e8 1924720640 1001 No Yes 1924644864 (100%) 9728 ( 0%) r10f1e7 1924720640 1001 No Yes 1924636672 (100%) 17408 ( 0%) r10f1e6 1924720640 1001 No Yes 1924636672 (100%) 17664 ( 0%) r10f1e5 1924720640 1001 No Yes 1924644864 (100%) 9728 ( 0%) r10f6e8 1924720640 1001 No Yes 1924644864 (100%) 9728 ( 0%) r10f1e9 1924720640 1001 No Yes 1924644864 (100%) 9728 ( 0%) r10f6e9 1924720640 1001 No Yes 1924644864 (100%) 9728 ( 0%) ------------- -------------------- ------------------- (pool total) 13473044480 13472497664 (100%) 83712 ( 0%) More or less empty. Interesting... On Thu, Apr 16, 2020 at 1:11 PM Frederick Stock wrote: > Do you have more than one GPFS storage pool in the system? If you do and > they align with the filesets then that might explain why moving data from > one fileset to another is causing increased IO operations. > > Fred > __________________________________________________ > Fred Stock | IBM Pittsburgh Lab | 720-430-8821 > stockf at us.ibm.com > > > > ----- Original message ----- > From: "J. Eric Wonderley" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: [EXTERNAL] [gpfsug-discuss] gpfs filesets question > Date: Thu, Apr 16, 2020 12:32 PM > > I have filesets setup in a filesystem...looks like: > [root at cl005 ~]# mmlsfileset home -L > Filesets in file system 'home': > Name Id RootInode ParentId Created > InodeSpace MaxInodes AllocInodes Comment > root 0 3 -- Tue Jun 30 > 07:54:09 2015 0 402653184 320946176 root fileset > hess 1 543733376 0 Tue Jun 13 > 14:56:13 2017 0 0 0 > predictHPC 2 1171116 0 Thu Jan 5 > 15:16:56 2017 0 0 0 > HYCCSIM 3 544258049 0 Wed Jun 14 > 10:00:41 2017 0 0 0 > socialdet 4 544258050 0 Wed Jun 14 > 10:01:02 2017 0 0 0 > arc 5 1171073 0 Thu Jan 5 > 15:07:09 2017 0 0 0 > arcadm 6 1171074 0 Thu Jan 5 > 15:07:10 2017 0 0 0 > > I beleive these are dependent filesets. Dependent on the root fileset. > Anyhow a user wants to move a large amount of data from one fileset to > another. Would this be a metadata only operation? He has attempted to > small amount of data and has noticed some thrasing. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Thu Apr 16 18:55:09 2020 From: stockf at us.ibm.com (Frederick Stock) Date: Thu, 16 Apr 2020 17:55:09 +0000 Subject: [gpfsug-discuss] gpfs filesets question In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Thu Apr 16 19:25:33 2020 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 16 Apr 2020 18:25:33 +0000 Subject: [gpfsug-discuss] gpfs filesets question In-Reply-To: References: Message-ID: If my memory serves? any move of files between filesets requires data to be moved, regardless of pool allocation for the files that need to be moved, and regardless if they are dependent filesets are both in the same independent fileset. -Bryan From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of J. Eric Wonderley Sent: Thursday, April 16, 2020 12:37 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] gpfs filesets question [EXTERNAL EMAIL] Hi Fred: I do. I have 3 pools. system, ssd data pool(fc_ssd400G) and a spinning disk pool(fc_8T). I want to think the ssd_data_pool is empty at the moment and the system pool is ssd and only contains metadata. [root at cl005 ~]# mmdf home -P fc_ssd400G disk disk size failure holds holds free KB free KB name in KB group metadata data in full blocks in fragments --------------- ------------- -------- -------- ----- -------------------- ------------------- Disks in storage pool: fc_ssd400G (Maximum disk size allowed is 97 TB) r10f1e8 1924720640 1001 No Yes 1924644864 (100%) 9728 ( 0%) r10f1e7 1924720640 1001 No Yes 1924636672 (100%) 17408 ( 0%) r10f1e6 1924720640 1001 No Yes 1924636672 (100%) 17664 ( 0%) r10f1e5 1924720640 1001 No Yes 1924644864 (100%) 9728 ( 0%) r10f6e8 1924720640 1001 No Yes 1924644864 (100%) 9728 ( 0%) r10f1e9 1924720640 1001 No Yes 1924644864 (100%) 9728 ( 0%) r10f6e9 1924720640 1001 No Yes 1924644864 (100%) 9728 ( 0%) ------------- -------------------- ------------------- (pool total) 13473044480 13472497664 (100%) 83712 ( 0%) More or less empty. Interesting... On Thu, Apr 16, 2020 at 1:11 PM Frederick Stock > wrote: Do you have more than one GPFS storage pool in the system? If you do and they align with the filesets then that might explain why moving data from one fileset to another is causing increased IO operations. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com ----- Original message ----- From: "J. Eric Wonderley" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: [EXTERNAL] [gpfsug-discuss] gpfs filesets question Date: Thu, Apr 16, 2020 12:32 PM I have filesets setup in a filesystem...looks like: [root at cl005 ~]# mmlsfileset home -L Filesets in file system 'home': Name Id RootInode ParentId Created InodeSpace MaxInodes AllocInodes Comment root 0 3 -- Tue Jun 30 07:54:09 2015 0 402653184 320946176 root fileset hess 1 543733376 0 Tue Jun 13 14:56:13 2017 0 0 0 predictHPC 2 1171116 0 Thu Jan 5 15:16:56 2017 0 0 0 HYCCSIM 3 544258049 0 Wed Jun 14 10:00:41 2017 0 0 0 socialdet 4 544258050 0 Wed Jun 14 10:01:02 2017 0 0 0 arc 5 1171073 0 Thu Jan 5 15:07:09 2017 0 0 0 arcadm 6 1171074 0 Thu Jan 5 15:07:10 2017 0 0 0 I beleive these are dependent filesets. Dependent on the root fileset. Anyhow a user wants to move a large amount of data from one fileset to another. Would this be a metadata only operation? He has attempted to small amount of data and has noticed some thrasing. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu Apr 16 17:50:42 2020 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 16 Apr 2020 16:50:42 +0000 Subject: [gpfsug-discuss] gpfs filesets question Message-ID: Moving data between filesets is like moving files between file systems. Normally when you move files between directories, it?s simple metadata, but with filesets (dependent or independent) is a full copy and delete of the old data. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of "J. Eric Wonderley" Reply-To: gpfsug main discussion list Date: Thursday, April 16, 2020 at 11:32 AM To: gpfsug main discussion list Subject: [EXTERNAL] [gpfsug-discuss] gpfs filesets question I have filesets setup in a filesystem...looks like: [root at cl005 ~]# mmlsfileset home -L Filesets in file system 'home': Name Id RootInode ParentId Created InodeSpace MaxInodes AllocInodes Comment root 0 3 -- Tue Jun 30 07:54:09 2015 0 402653184 320946176 root fileset hess 1 543733376 0 Tue Jun 13 14:56:13 2017 0 0 0 predictHPC 2 1171116 0 Thu Jan 5 15:16:56 2017 0 0 0 HYCCSIM 3 544258049 0 Wed Jun 14 10:00:41 2017 0 0 0 socialdet 4 544258050 0 Wed Jun 14 10:01:02 2017 0 0 0 arc 5 1171073 0 Thu Jan 5 15:07:09 2017 0 0 0 arcadm 6 1171074 0 Thu Jan 5 15:07:10 2017 0 0 0 I beleive these are dependent filesets. Dependent on the root fileset. Anyhow a user wants to move a large amount of data from one fileset to another. Would this be a metadata only operation? He has attempted to small amount of data and has noticed some thrasing. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carlz at us.ibm.com Thu Apr 16 21:24:51 2020 From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com) Date: Thu, 16 Apr 2020 20:24:51 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction Message-ID: > From my understanding existing customers from DDN, Lenovo, etc. that have v4 with socket based licenses >are not entitled v5 licenses socket licenses. Is that a correct understanding? It is not, and I apologize in advance for the length of this explanation. I want to be precise and as transparent as possible while respecting the confidentiality of our OEM partners and the contracts we have with them, and there is a lot of misinformation out there. The short version is that the same rules apply to DDN, Lenovo, and other OEM systems that apply to IBM ESS. You can update your system in place and keep your existing metric, as long as your vendor can supply you with V5 for that hardware. The update from V4 to V5 is not relevant. The long version: We apply the same standard to our OEM's systems as to our own ESS: they can upgrade their existing customers on their existing OEM systems to V5 and stay on Sockets, *provided* that the OEM has entered into an OEM license for Scale V5 and can supply it, and *provided* that the hardware is still supported by the software stack. But new customers and new OEM systems are all licensed by Capacity. This also applies to IBM's own ESS: you can keep upgrading your old (if hardware is supported) gen 1 ESS on Sockets, but if you replace it with a new ESS, that will come with capacity licenses. (Lenovo may want to chime in about their own GSS customers here, who have Socket licenses, and DSS-G customers, who have Capacity licenses). Existing systems that originally shipped with Socket licenses are "grandfathered in". And of course, if you move from a Lenovo system to an IBM system, or from an IBM system to a Lenovo system, or any other change of suppliers, that new system will come with capacity licenses, simply because it's a new system. If you're replacing an old system running with V4 with a new one running V5 it might look like you are forced to switch to update, but that's not the case: if you replace an old "grandfathered in" system that you had already updated to V5 on Sockets, your new system would *still* come with Capacity licenses - again, because it's a new system. Now where much of the confusion occurs is this: What if your supplier does not provide an update to V5 at all, *neither as Capacity nor Socket licenses*? Then you have no choice: to get to V5, you have to move to a new supplier, and consequently you have to move to Capacity licensing. But once again, it's not that moving from V4 to V5 requires a change of metric; it's moving to a new system from a new supplier. I hope that helps to make things clearer. Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com From mhennecke at lenovo.com Thu Apr 16 22:19:13 2020 From: mhennecke at lenovo.com (Michael Hennecke) Date: Thu, 16 Apr 2020 21:19:13 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing - Lenovo information Message-ID: Hi, Thanks a lot Carl for these clarifications. Some additions from the Lenovo side: Lenovo *GSS* (which is no longer sold, but still fully supported) uses the socked-based Spectrum Scale Standard Edition or Advanced Edition. We provide both a 4.2 based version and a 5.0 based version of the GSS installation packages. Customers get access to the Edition they acquired with their GSS system(s), and they can choose to install the 4.2 or the 5.0 code. Lenovo GSS customers are automatically entitled for those GSS downloads. Customers who acquired a GSS system when System x was still part of IBM can also obtain the latest GSS installation packages from Lenovo (v4 and v5), but will need to provide a valid proof of entitlement of their Spectrum Scale licenses before being granted access. Lenovo *DSS-G* uses capacity-based licensing (per-disk or per-TB), with the Spectrum Scale Data Access Edition or Data Management Edition. For DSS-G we also provide both a 4.2 based installation package and a 5.0 based installation package, and customers can choose which one to install. Note that the Lenovo installation tarballs for DSS-G are named for example "dss-g-2.6a-standard-5.0.tgz" (installation package includes the capacity-based DAE) or "dss-g-2.6a-advanced-5.0.tgz" (installation package includes the capacity-based DME), so the Lenovo naming convention for the DSS-G packages is not identical with the naming of the Scale Edition that it includes. PS: There is no path to change a GSS system from a socket license to a capacity license. Replacing it with a DSS-G will of course also replace the licenses, as DSS-G comes with capacity-based licenses. Mit freundlichen Gr?ssen / Best regards, Michael Hennecke HPC Chief Technologist - HPC and AI Business Unit? -- Lenovo Global Technology (Germany) GmbH * Am Zehnthof 77 * D-45307 Essen * Germany Gesch?ftsf?hrung: Colm Gleeson, Christophe Laurent * Sitz der Gesellschaft: Stuttgart * HRB-Nr.: 758298, AG Stuttgart -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Carl Zetie - carlz at us.ibm.com Sent: Thursday, April 16, 2020 10:25 PM To: gpfsug-discuss at spectrumscale.org Subject: [External] Re: [gpfsug-discuss] Spectrum Scale licensing - important correction > From my understanding existing customers from DDN, Lenovo, etc. that >have v4 with socket based licenses are not entitled v5 licenses socket licenses. Is that a correct understanding? It is not, and I apologize in advance for the length of this explanation. I want to be precise and as transparent as possible while respecting the confidentiality of our OEM partners and the contracts we have with them, and there is a lot of misinformation out there. The short version is that the same rules apply to DDN, Lenovo, and other OEM systems that apply to IBM ESS. You can update your system in place and keep your existing metric, as long as your vendor can supply you with V5 for that hardware. The update from V4 to V5 is not relevant. The long version: We apply the same standard to our OEM's systems as to our own ESS: they can upgrade their existing customers on their existing OEM systems to V5 and stay on Sockets, *provided* that the OEM has entered into an OEM license for Scale V5 and can supply it, and *provided* that the hardware is still supported by the software stack. But new customers and new OEM systems are all licensed by Capacity. This also applies to IBM's own ESS: you can keep upgrading your old (if hardware is supported) gen 1 ESS on Sockets, but if you replace it with a new ESS, that will come with capacity licenses. (Lenovo may want to chime in about their own GSS customers here, who have Socket licenses, and DSS-G customers, who have Capacity licenses). Existing systems that originally shipped with Socket licenses are "grandfathered in". And of course, if you move from a Lenovo system to an IBM system, or from an IBM system to a Lenovo system, or any other change of suppliers, that new system will come with capacity licenses, simply because it's a new system. If you're replacing an old system running with V4 with a new one running V5 it might look like you are forced to switch to update, but that's not the case: if you replace an old "grandfathered in" system that you had already updated to V5 on Sockets, your new system would *still* come with Capacity licenses - again, because it's a new system. Now where much of the confusion occurs is this: What if your supplier does not provide an update to V5 at all, *neither as Capacity nor Socket licenses*? Then you have no choice: to get to V5, you have to move to a new supplier, and consequently you have to move to Capacity licensing. But once again, it's not that moving from V4 to V5 requires a change of metric; it's moving to a new system from a new supplier. I hope that helps to make things clearer. Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Greg.Lehmann at csiro.au Fri Apr 17 00:48:18 2020 From: Greg.Lehmann at csiro.au (Lehmann, Greg (IM&T, Pullenvale)) Date: Thu, 16 Apr 2020 23:48:18 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing In-Reply-To: References: Message-ID: Plus one. It is not just volume licensing. The socket licensing costs have gone through the roof, at least in Australia. IBM tempts you with a cheap introduction and then once you are hooked, ramps up the price. They are counting on the migration costs outweighing the licensing fee increases. Unfortunately, our management won't stand for this business approach, so we get to do the migrations (boring as the proverbial bat ... you know what.) I think this forum is a good place to discuss it. IBM and customers on here need to know all about it. It is a user group after all and moving away from a product is part of the lifecycle. We were going to use GPFS for HPC scratch but went to market and ended up with BeeGFS. Further pricing pressure has meant GPFS is being phased out in all areas. We split our BeeGFS cluster of NVMe servers in half on arrival and have been trying other filesystems on half of it. We were going to try GPFS ECE but given the pricing we have been quoted have decided not to waste our time. We are gearing up to try Lustre on it. We have also noted the feature improvements with Lustre. Maybe if IBM had saved the money that a rebranding costs (GPFS to Spectrum Scale) they would not have had to crank up the price of GPFS? Cheers, Greg From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Flanders, Dean Sent: Thursday, April 16, 2020 1:27 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Spectrum Scale licensing Hello All, As IBM has completely switched to capacity based licensing in order to use SS v5 I was wondering how others are dealing with this? We do not find the capacity based licensing sustainable. Our long term plan is to migrate away from SS v5 to Lustre, and based on the Lustre roadmap we have seen it should have the features we need within the next ~1 year (we are fortunate to have good contacts). We would really like to stay with SS/GPFS and have been big advocates of SS/GPFS over the years, but the capacity based licensing is pushing us into evaluating alternatives. I realize this may not be proper to discuss this directly in this email list, so feel free to email directly with your suggestions or your plans. Thanks and kind regards, Dean -------------- next part -------------- An HTML attachment was scrubbed... URL: From dean.flanders at fmi.ch Fri Apr 17 01:40:22 2020 From: dean.flanders at fmi.ch (Flanders, Dean) Date: Fri, 17 Apr 2020 00:40:22 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction In-Reply-To: References: Message-ID: Hello Carl, Thanks for the clarification. I have always heard the term "existing customers" so originally I thought we were fine, but this is the first time I have seen the term "existing systems". However, it seems what I said before is mostly correct, eventually all customers will be forced to capacity based licensing as they life cycle hardware (even IBM customers). In addition it seems there is a diminishing number of OEMs that can sell SS v5, which is what happened in our case when we wanted to go from v4 to v5 with existing hardware (in our case DDN). So I strongly encourage organizations to be thinking of these issues in their long term planning. Thanks and kind regards, Dean -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Carl Zetie - carlz at us.ibm.com Sent: Thursday, April 16, 2020 10:25 PM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Spectrum Scale licensing - important correction > From my understanding existing customers from DDN, Lenovo, etc. that >have v4 with socket based licenses are not entitled v5 licenses socket licenses. Is that a correct understanding? It is not, and I apologize in advance for the length of this explanation. I want to be precise and as transparent as possible while respecting the confidentiality of our OEM partners and the contracts we have with them, and there is a lot of misinformation out there. The short version is that the same rules apply to DDN, Lenovo, and other OEM systems that apply to IBM ESS. You can update your system in place and keep your existing metric, as long as your vendor can supply you with V5 for that hardware. The update from V4 to V5 is not relevant. The long version: We apply the same standard to our OEM's systems as to our own ESS: they can upgrade their existing customers on their existing OEM systems to V5 and stay on Sockets, *provided* that the OEM has entered into an OEM license for Scale V5 and can supply it, and *provided* that the hardware is still supported by the software stack. But new customers and new OEM systems are all licensed by Capacity. This also applies to IBM's own ESS: you can keep upgrading your old (if hardware is supported) gen 1 ESS on Sockets, but if you replace it with a new ESS, that will come with capacity licenses. (Lenovo may want to chime in about their own GSS customers here, who have Socket licenses, and DSS-G customers, who have Capacity licenses). Existing systems that originally shipped with Socket licenses are "grandfathered in". And of course, if you move from a Lenovo system to an IBM system, or from an IBM system to a Lenovo system, or any other change of suppliers, that new system will come with capacity licenses, simply because it's a new system. If you're replacing an old system running with V4 with a new one running V5 it might look like you are forced to switch to update, but that's not the case: if you replace an old "grandfathered in" system that you had already updated to V5 on Sockets, your new system would *still* come with Capacity licenses - again, because it's a new system. Now where much of the confusion occurs is this: What if your supplier does not provide an update to V5 at all, *neither as Capacity nor Socket licenses*? Then you have no choice: to get to V5, you have to move to a new supplier, and consequently you have to move to Capacity licensing. But once again, it's not that moving from V4 to V5 requires a change of metric; it's moving to a new system from a new supplier. I hope that helps to make things clearer. Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From sedl at re-store.net Fri Apr 17 03:06:57 2020 From: sedl at re-store.net (Michael Sedlmayer) Date: Fri, 17 Apr 2020 02:06:57 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction In-Reply-To: References: Message-ID: One more important distinction with the DDN installations. Most DDN systems were deployed with an OEM license of GPFS v4. That license allowed DDN to use GPFS on their hardware appliance, but and didn't ever equate to an IBM software license. To my knowledge, DDN has not been a reseller of IBM licenses. We've had a lot of issues where our DDN users wanted to upgrade to Spectrum Scale 5; DDN couldn't provide the licensed code; and the user learned that they really didn't own IBM software (just the right to use the software on their DDN system) -michael Michael Sedlmayer -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Flanders, Dean Sent: Thursday, April 16, 2020 5:40 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Spectrum Scale licensing - important correction Hello Carl, Thanks for the clarification. I have always heard the term "existing customers" so originally I thought we were fine, but this is the first time I have seen the term "existing systems". However, it seems what I said before is mostly correct, eventually all customers will be forced to capacity based licensing as they life cycle hardware (even IBM customers). In addition it seems there is a diminishing number of OEMs that can sell SS v5, which is what happened in our case when we wanted to go from v4 to v5 with existing hardware (in our case DDN). So I strongly encourage organizations to be thinking of these issues in their long term planning. Thanks and kind regards, Dean -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Carl Zetie - carlz at us.ibm.com Sent: Thursday, April 16, 2020 10:25 PM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Spectrum Scale licensing - important correction > From my understanding existing customers from DDN, Lenovo, etc. that >have v4 with socket based licenses are not entitled v5 licenses socket licenses. Is that a correct understanding? It is not, and I apologize in advance for the length of this explanation. I want to be precise and as transparent as possible while respecting the confidentiality of our OEM partners and the contracts we have with them, and there is a lot of misinformation out there. The short version is that the same rules apply to DDN, Lenovo, and other OEM systems that apply to IBM ESS. You can update your system in place and keep your existing metric, as long as your vendor can supply you with V5 for that hardware. The update from V4 to V5 is not relevant. The long version: We apply the same standard to our OEM's systems as to our own ESS: they can upgrade their existing customers on their existing OEM systems to V5 and stay on Sockets, *provided* that the OEM has entered into an OEM license for Scale V5 and can supply it, and *provided* that the hardware is still supported by the software stack. But new customers and new OEM systems are all licensed by Capacity. This also applies to IBM's own ESS: you can keep upgrading your old (if hardware is supported) gen 1 ESS on Sockets, but if you replace it with a new ESS, that will come with capacity licenses. (Lenovo may want to chime in about their own GSS customers here, who have Socket licenses, and DSS-G customers, who have Capacity licenses). Existing systems that originally shipped with Socket licenses are "grandfathered in". And of course, if you move from a Lenovo system to an IBM system, or from an IBM system to a Lenovo system, or any other change of suppliers, that new system will come with capacity licenses, simply because it's a new system. If you're replacing an old system running with V4 with a new one running V5 it might look like you are forced to switch to update, but that's not the case: if you replace an old "grandfathered in" system that you had already updated to V5 on Sockets, your new system would *still* come with Capacity licenses - again, because it's a new system. Now where much of the confusion occurs is this: What if your supplier does not provide an update to V5 at all, *neither as Capacity nor Socket licenses*? Then you have no choice: to get to V5, you have to move to a new supplier, and consequently you have to move to Capacity licensing. But once again, it's not that moving from V4 to V5 requires a change of metric; it's moving to a new system from a new supplier. I hope that helps to make things clearer. Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From steve.hindmarsh at crick.ac.uk Fri Apr 17 08:35:51 2020 From: steve.hindmarsh at crick.ac.uk (Steve Hindmarsh) Date: Fri, 17 Apr 2020 07:35:51 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction In-Reply-To: References: , Message-ID: <26D9CF79-4118-4F5A-AF96-CA72AA1EE712@crick.ac.uk> We are caught in the same position (12 PB on DDN GridScaler) and currently unable to upgrade to v5. If the position between IBM and DDN can?t be resolved, an extension of meaningful support from IBM (i.e. critical patches not just a sympathetic ear) for OEM licences would make a *huge* difference to those of us who need to provide critical production research data services on current equipment for another few years at least - with appropriate paid vendor support of course. Best, Steve Steve Hindmarsh Head of Scientific Computing The Francis Crick Institute Sent from my mobile On 17 Apr 2020, at 03:07, Michael Sedlmayer wrote: ?One more important distinction with the DDN installations. Most DDN systems were deployed with an OEM license of GPFS v4. That license allowed DDN to use GPFS on their hardware appliance, but and didn't ever equate to an IBM software license. To my knowledge, DDN has not been a reseller of IBM licenses. We've had a lot of issues where our DDN users wanted to upgrade to Spectrum Scale 5; DDN couldn't provide the licensed code; and the user learned that they really didn't own IBM software (just the right to use the software on their DDN system) -michael Michael Sedlmayer -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Flanders, Dean Sent: Thursday, April 16, 2020 5:40 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Spectrum Scale licensing - important correction Hello Carl, Thanks for the clarification. I have always heard the term "existing customers" so originally I thought we were fine, but this is the first time I have seen the term "existing systems". However, it seems what I said before is mostly correct, eventually all customers will be forced to capacity based licensing as they life cycle hardware (even IBM customers). In addition it seems there is a diminishing number of OEMs that can sell SS v5, which is what happened in our case when we wanted to go from v4 to v5 with existing hardware (in our case DDN). So I strongly encourage organizations to be thinking of these issues in their long term planning. Thanks and kind regards, Dean -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Carl Zetie - carlz at us.ibm.com Sent: Thursday, April 16, 2020 10:25 PM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Spectrum Scale licensing - important correction From my understanding existing customers from DDN, Lenovo, etc. that have v4 with socket based licenses are not entitled v5 licenses socket licenses. Is that a correct understanding? It is not, and I apologize in advance for the length of this explanation. I want to be precise and as transparent as possible while respecting the confidentiality of our OEM partners and the contracts we have with them, and there is a lot of misinformation out there. The short version is that the same rules apply to DDN, Lenovo, and other OEM systems that apply to IBM ESS. You can update your system in place and keep your existing metric, as long as your vendor can supply you with V5 for that hardware. The update from V4 to V5 is not relevant. The long version: We apply the same standard to our OEM's systems as to our own ESS: they can upgrade their existing customers on their existing OEM systems to V5 and stay on Sockets, *provided* that the OEM has entered into an OEM license for Scale V5 and can supply it, and *provided* that the hardware is still supported by the software stack. But new customers and new OEM systems are all licensed by Capacity. This also applies to IBM's own ESS: you can keep upgrading your old (if hardware is supported) gen 1 ESS on Sockets, but if you replace it with a new ESS, that will come with capacity licenses. (Lenovo may want to chime in about their own GSS customers here, who have Socket licenses, and DSS-G customers, who have Capacity licenses). Existing systems that originally shipped with Socket licenses are "grandfathered in". And of course, if you move from a Lenovo system to an IBM system, or from an IBM system to a Lenovo system, or any other change of suppliers, that new system will come with capacity licenses, simply because it's a new system. If you're replacing an old system running with V4 with a new one running V5 it might look like you are forced to switch to update, but that's not the case: if you replace an old "grandfathered in" system that you had already updated to V5 on Sockets, your new system would *still* come with Capacity licenses - again, because it's a new system. Now where much of the confusion occurs is this: What if your supplier does not provide an update to V5 at all, *neither as Capacity nor Socket licenses*? Then you have no choice: to get to V5, you have to move to a new supplier, and consequently you have to move to Capacity licensing. But once again, it's not that moving from V4 to V5 requires a change of metric; it's moving to a new system from a new supplier. I hope that helps to make things clearer. Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7C%7C75b2fc2faa9347d0fcbd08d7e274083b%7C4eed7807ebad415aa7a99170947f4eae%7C0%7C1%7C637226860297343792&sdata=CSG%2FpHcDNU3ZAyp2hdI4oZXCO20UtUmUuzLe05uqRiI%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7C%7C75b2fc2faa9347d0fcbd08d7e274083b%7C4eed7807ebad415aa7a99170947f4eae%7C0%7C1%7C637226860297343792&sdata=CSG%2FpHcDNU3ZAyp2hdI4oZXCO20UtUmUuzLe05uqRiI%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7C%7C75b2fc2faa9347d0fcbd08d7e274083b%7C4eed7807ebad415aa7a99170947f4eae%7C0%7C1%7C637226860297343792&sdata=CSG%2FpHcDNU3ZAyp2hdI4oZXCO20UtUmUuzLe05uqRiI%3D&reserved=0 The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Fri Apr 17 09:19:39 2020 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Fri, 17 Apr 2020 08:19:39 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction In-Reply-To: <26D9CF79-4118-4F5A-AF96-CA72AA1EE712@crick.ac.uk> References: , , <26D9CF79-4118-4F5A-AF96-CA72AA1EE712@crick.ac.uk> Message-ID: Especially with the pandemic. No one is exactly sure what next year?s budget is going to look like. I wouldn?t expect to be buying large amounts of storage to replace so far perfectly good storage. -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Apr 17, 2020, at 03:36, Steve Hindmarsh wrote: ? We are caught in the same position (12 PB on DDN GridScaler) and currently unable to upgrade to v5. If the position between IBM and DDN can?t be resolved, an extension of meaningful support from IBM (i.e. critical patches not just a sympathetic ear) for OEM licences would make a *huge* difference to those of us who need to provide critical production research data services on current equipment for another few years at least - with appropriate paid vendor support of course. Best, Steve Steve Hindmarsh Head of Scientific Computing The Francis Crick Institute Sent from my mobile On 17 Apr 2020, at 03:07, Michael Sedlmayer wrote: ?One more important distinction with the DDN installations. Most DDN systems were deployed with an OEM license of GPFS v4. That license allowed DDN to use GPFS on their hardware appliance, but and didn't ever equate to an IBM software license. To my knowledge, DDN has not been a reseller of IBM licenses. We've had a lot of issues where our DDN users wanted to upgrade to Spectrum Scale 5; DDN couldn't provide the licensed code; and the user learned that they really didn't own IBM software (just the right to use the software on their DDN system) -michael Michael Sedlmayer -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Flanders, Dean Sent: Thursday, April 16, 2020 5:40 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Spectrum Scale licensing - important correction Hello Carl, Thanks for the clarification. I have always heard the term "existing customers" so originally I thought we were fine, but this is the first time I have seen the term "existing systems". However, it seems what I said before is mostly correct, eventually all customers will be forced to capacity based licensing as they life cycle hardware (even IBM customers). In addition it seems there is a diminishing number of OEMs that can sell SS v5, which is what happened in our case when we wanted to go from v4 to v5 with existing hardware (in our case DDN). So I strongly encourage organizations to be thinking of these issues in their long term planning. Thanks and kind regards, Dean -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Carl Zetie - carlz at us.ibm.com Sent: Thursday, April 16, 2020 10:25 PM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Spectrum Scale licensing - important correction From my understanding existing customers from DDN, Lenovo, etc. that have v4 with socket based licenses are not entitled v5 licenses socket licenses. Is that a correct understanding? It is not, and I apologize in advance for the length of this explanation. I want to be precise and as transparent as possible while respecting the confidentiality of our OEM partners and the contracts we have with them, and there is a lot of misinformation out there. The short version is that the same rules apply to DDN, Lenovo, and other OEM systems that apply to IBM ESS. You can update your system in place and keep your existing metric, as long as your vendor can supply you with V5 for that hardware. The update from V4 to V5 is not relevant. The long version: We apply the same standard to our OEM's systems as to our own ESS: they can upgrade their existing customers on their existing OEM systems to V5 and stay on Sockets, *provided* that the OEM has entered into an OEM license for Scale V5 and can supply it, and *provided* that the hardware is still supported by the software stack. But new customers and new OEM systems are all licensed by Capacity. This also applies to IBM's own ESS: you can keep upgrading your old (if hardware is supported) gen 1 ESS on Sockets, but if you replace it with a new ESS, that will come with capacity licenses. (Lenovo may want to chime in about their own GSS customers here, who have Socket licenses, and DSS-G customers, who have Capacity licenses). Existing systems that originally shipped with Socket licenses are "grandfathered in". And of course, if you move from a Lenovo system to an IBM system, or from an IBM system to a Lenovo system, or any other change of suppliers, that new system will come with capacity licenses, simply because it's a new system. If you're replacing an old system running with V4 with a new one running V5 it might look like you are forced to switch to update, but that's not the case: if you replace an old "grandfathered in" system that you had already updated to V5 on Sockets, your new system would *still* come with Capacity licenses - again, because it's a new system. Now where much of the confusion occurs is this: What if your supplier does not provide an update to V5 at all, *neither as Capacity nor Socket licenses*? Then you have no choice: to get to V5, you have to move to a new supplier, and consequently you have to move to Capacity licensing. But once again, it's not that moving from V4 to V5 requires a change of metric; it's moving to a new system from a new supplier. I hope that helps to make things clearer. Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7C%7C75b2fc2faa9347d0fcbd08d7e274083b%7C4eed7807ebad415aa7a99170947f4eae%7C0%7C1%7C637226860297343792&sdata=CSG%2FpHcDNU3ZAyp2hdI4oZXCO20UtUmUuzLe05uqRiI%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7C%7C75b2fc2faa9347d0fcbd08d7e274083b%7C4eed7807ebad415aa7a99170947f4eae%7C0%7C1%7C637226860297343792&sdata=CSG%2FpHcDNU3ZAyp2hdI4oZXCO20UtUmUuzLe05uqRiI%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7C%7C75b2fc2faa9347d0fcbd08d7e274083b%7C4eed7807ebad415aa7a99170947f4eae%7C0%7C1%7C637226860297343792&sdata=CSG%2FpHcDNU3ZAyp2hdI4oZXCO20UtUmUuzLe05uqRiI%3D&reserved=0 The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.horton at icr.ac.uk Fri Apr 17 10:29:52 2020 From: robert.horton at icr.ac.uk (Robert Horton) Date: Fri, 17 Apr 2020 09:29:52 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction In-Reply-To: <26D9CF79-4118-4F5A-AF96-CA72AA1EE712@crick.ac.uk> References: , <26D9CF79-4118-4F5A-AF96-CA72AA1EE712@crick.ac.uk> Message-ID: We're in the same boat. I'm not sure what the issue is between DDN and IBM (although I've heard various rumors) but I really wish they would sort something out. Rob On Fri, 2020-04-17 at 07:35 +0000, Steve Hindmarsh wrote: > CAUTION: This email originated from outside of the ICR. Do not click > links or open attachments unless you recognize the sender's email > address and know the content is safe. > > We are caught in the same position (12 PB on DDN GridScaler) and > currently unable to upgrade to v5. > > If the position between IBM and DDN can?t be resolved, an extension > of meaningful support from IBM (i.e. critical patches not just a > sympathetic ear) for OEM licences would make a *huge* difference to > those of us who need to provide critical production research data > services on current equipment for another few years at least - with > appropriate paid vendor support of course. > > Best, > Steve > > Steve Hindmarsh > Head of Scientific Computing > The Francis Crick Institute -- Robert Horton | Research Data Storage Lead The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk | Twitter @ICR_London Facebook: www.facebook.com/theinstituteofcancerresearch The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network. From yeep at robust.my Fri Apr 17 11:31:49 2020 From: yeep at robust.my (T.A. Yeep) Date: Fri, 17 Apr 2020 18:31:49 +0800 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction In-Reply-To: References: Message-ID: Hi Carl, I'm confused here, in the previous email it was said *And for ESS, it is licensed Per Drive with different prices for HDDs and SSDs.* But then you mentioned in below email that: But new customers and new OEM systems are *all licensed by Capacity. This also applies to IBM's own ESS*: you can keep upgrading your old (if hardware is supported) gen 1 ESS on Sockets, but if you replace it with *a new ESS, that will come with capacity licenses*. Now the question, ESS is license per Drive or by capacity? .On Fri, Apr 17, 2020 at 4:25 AM Carl Zetie - carlz at us.ibm.com < carlz at us.ibm.com> wrote: > > From my understanding existing customers from DDN, Lenovo, etc. that > have v4 with socket based licenses > >are not entitled v5 licenses socket licenses. Is that a correct > understanding? > > It is not, and I apologize in advance for the length of this explanation. > I want to be precise and as transparent as possible while respecting the > confidentiality of our OEM partners and the contracts we have with them, > and there is a lot of misinformation out there. > > The short version is that the same rules apply to DDN, Lenovo, and other > OEM systems that apply to IBM ESS. You can update your system in place and > keep your existing metric, as long as your vendor can supply you with V5 > for that hardware. The update from V4 to V5 is not relevant. > > > The long version: > > We apply the same standard to our OEM's systems as to our own ESS: they > can upgrade their existing customers on their existing OEM systems to V5 > and stay on Sockets, *provided* that the OEM has entered into an OEM > license for Scale V5 and can supply it, and *provided* that the hardware is > still supported by the software stack. But new customers and new OEM > systems are all licensed by Capacity. This also applies to IBM's own ESS: > you can keep upgrading your old (if hardware is supported) gen 1 ESS on > Sockets, but if you replace it with a new ESS, that will come with capacity > licenses. (Lenovo may want to chime in about their own GSS customers here, > who have Socket licenses, and DSS-G customers, who have Capacity licenses). > Existing systems that originally shipped with Socket licenses are > "grandfathered in". > > And of course, if you move from a Lenovo system to an IBM system, or from > an IBM system to a Lenovo system, or any other change of suppliers, that > new system will come with capacity licenses, simply because it's a new > system. If you're replacing an old system running with V4 with a new one > running V5 it might look like you are forced to switch to update, but > that's not the case: if you replace an old "grandfathered in" system that > you had already updated to V5 on Sockets, your new system would *still* > come with Capacity licenses - again, because it's a new system. > > Now where much of the confusion occurs is this: What if your supplier does > not provide an update to V5 at all, *neither as Capacity nor Socket > licenses*? Then you have no choice: to get to V5, you have to move to a new > supplier, and consequently you have to move to Capacity licensing. But once > again, it's not that moving from V4 to V5 requires a change of metric; it's > moving to a new system from a new supplier. > > I hope that helps to make things clearer. > > > > Carl Zetie > Program Director > Offering Management > Spectrum Scale > ---- > (919) 473 3318 ][ Research Triangle Park > carlz at us.ibm.com > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Best regards *T.A. Yeep*Mobile: 016-719 8506 | Tel/Fax: 03-6261 7237 | www.robusthpc.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Fri Apr 17 11:50:22 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 17 Apr 2020 11:50:22 +0100 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction In-Reply-To: References: Message-ID: On 17/04/2020 11:31, T.A. Yeep wrote: > Hi Carl, > > I'm confused here, in the previous email it was said *And for ESS, it is > licensed?Per Drive with different prices for HDDs and SSDs.* > > But then you mentioned in below email that: > But new customers and new OEM systems are *all licensed by Capacity. > This also applies to IBM's own ESS*: you can keep upgrading your old (if > hardware is supported) gen 1 ESS on Sockets, but if you replace it with > *a new ESS, that will come with capacity licenses*. > > Now the question, ESS is license per Drive or by capacity? > Well by drive is "capacity" based licensing unless you have some sort of magical infinite capacity drives :-) Under the PVU scheme if you know what you are doing you could game the system. For example get a handful of servers get PVU licenses for them create a GPFS file system handing off the back using say Fibre Channel and cheap FC attached arrays (Dell MD3000 series springs to mind) and then hang many PB off the back. I could using this scheme create a 100PB filesystem for under a thousand PVU of GPFS server licenses. Add in another cluster for protocol nodes and if you are not mounting on HPC nodes that's a winner :-) In a similar manner I use a pimped out ancient Dell R300 with dual core Xeon for backing up my GPFS filesystem because it's 100PVU of TSM licensing and I am cheap, and besides it is more than enough grunt for the job. A new machine would be 240 PVU minimum (4*70). I plan on replacing the PERC SAS6 card with a H710 and new internal cabling to run RHEL8 :-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From jonathan.buzzard at strath.ac.uk Fri Apr 17 12:02:44 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 17 Apr 2020 12:02:44 +0100 Subject: [gpfsug-discuss] Spectrum Scale licensing In-Reply-To: References: Message-ID: On 16/04/2020 04:26, Flanders, Dean wrote: > Hello All, > > As IBM has completely switched to capacity based licensing in order to > use SS v5 I was wondering how others are dealing with this? We do not > find the capacity based licensing sustainable. Our long term plan is to > migrate away from SS v5 to Lustre, and based on the Lustre roadmap we > have seen it should have the features we need within the next ~1 year > (we are fortunate to have good contacts). The problem is the features of Lustre that are missing in GPFS :-) For example have they removed the Lustre feature where roughly biannually the metadata server kernel panics introducing incorrectable corruption into the file system that will within six months cause constant crashes of the metadata node to the point where the file system is unusable? In best slashdot car analogy GPFS is like driving round in a Aston Martin DB9, where Lustre is like having a Ford Pinto. You will never be happy with Pinto in my experience having gone from the DB9 to the Pinto and back to the DB9. That said if you use Lustre as a high performance scratch file system fro HPC and every ~6 months do a shutdown and upgrade, and at the same time reformat your Lustre file system you will be fine. Our experience with Lustre was so bad we specifically excluded it as an option for our current HPC system when it went out to tender. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From carlz at us.ibm.com Fri Apr 17 13:10:00 2020 From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com) Date: Fri, 17 Apr 2020 12:10:00 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction Message-ID: <19F21F2C-901E-4A04-AB94-740E2C2B5205@us.ibm.com> >Now the question, ESS is license per Drive or by capacity? I apologize for the confusion. Within IBM Storage when we say ?capacity? licensing we use that as an umbrella term for both Per TB/PB *or* Per Drive (HDD or SSD). This is contrasted with ?processor? metrics including Socket and the even older PVU licensing. And yes, we IBMers should be more careful about our tendency to use terminology that nobody else in the world does. (Don?t get me started on terabyte versus tebibyte?). So, for the sake of completeness and for anybody reviewing the thread in the future: * Per Drive is available with ESS, Lenovo DSS, and a number of other OEM solutions*. * Per TB/Per PB is available for software defined storage, including some OEM solutions - basically anywhere where figuring out the number of physical drives is infeasible.** * You can if you wish license ESS with Per TB/PB, for example if you want to have a single pool of licensing across an environment that mixes software-defined, ESS, or public cloud; or if you want to include your ESS licenses in an ELA. This is almost always more expensive than Per Drive, but some customers are willing to pay for the privilege of the flexibility. I hope that helps. *(In some cases the customer may not even know it because the OEM solution is sold as a whole with a bottom line price, and the customer does not see a line item price for Scale. In at least one case, the vertical market solution doesn?t even expose the fact that the storage is provided by Scale.) **(Imagine trying to figure out the ?real? number of drives in a high-end storage array that does RAIDing, hides some drives as spares, offers thin provisioning, etc. Or on public cloud where the ?drives? are all virtual.) Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com [signature_1886717044] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 69558 bytes Desc: image001.png URL: From carlz at us.ibm.com Fri Apr 17 13:16:38 2020 From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com) Date: Fri, 17 Apr 2020 12:16:38 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction Message-ID: <48C781AA-BF81-4E8B-A290-C55A0C322DD4@us.ibm.com> Rob Horton wrote: >I'm not sure what the issue is between DDN and IBM (although I've heard various rumors) >but I really wish they would sort something out. Yes, it?s a pain. IBM and DDN are trying very hard to work something out, but it?s hard to get all the ?I?s dotted and ?T?s crossed with our respective legal and exec reviewers so that when we do say something it will be complete, clear, and final; and not require long, baroque threads for people to figure out where exactly they are? I wish I could say more, but I need to respect the confidentiality of the relationship and the live discussion. In the meantime, I thank you for your patience, and ask that you not believe any rumors you might hear, because whatever they are, they are wrong (or at least incomplete). In this situation, as a wise man once observed, ?those who Say don?t Know; those who Know don?t Say?. Regards, Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com [signature_749317756] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 69558 bytes Desc: image001.png URL: From aaron.knister at gmail.com Fri Apr 17 14:15:07 2020 From: aaron.knister at gmail.com (Aaron Knister) Date: Fri, 17 Apr 2020 09:15:07 -0400 Subject: [gpfsug-discuss] Spectrum Scale licensing In-Reply-To: References: Message-ID: Yeah, I had similar experiences in the past (over a decade ago) with Lustre and was heavily heavily anti-Lustre. That said, I just finished several weeks of what I?d call grueling testing of DDN Lustre and GPFS on the same hardware and I?m reasonably convinced much of that is behind us now (things like stability, metadata performance, random I/O performance just don?t appear to be issues anymore and in some cases these operations are now faster in Lustre). Full disclosure, I work for DDN, but the source of my paycheck has relatively little bearing on my technical opinions. All I?m saying is for me to honestly believe Lustre is worth another shot after the experiences I had years ago is significant. I do think it?s key to have a vendor behind you, vs rolling your own. I have seen that make a difference. I?m happy to take any further conversation/questions offline, I?m in no way trying to turn this into a marketing campaign. Sent from my iPhone > On Apr 17, 2020, at 07:02, Jonathan Buzzard wrote: > > ?On 16/04/2020 04:26, Flanders, Dean wrote: >> Hello All, >> As IBM has completely switched to capacity based licensing in order to use SS v5 I was wondering how others are dealing with this? We do not find the capacity based licensing sustainable. Our long term plan is to migrate away from SS v5 to Lustre, and based on the Lustre roadmap we have seen it should have the features we need within the next ~1 year (we are fortunate to have good contacts). > > The problem is the features of Lustre that are missing in GPFS :-) > > For example have they removed the Lustre feature where roughly biannually the metadata server kernel panics introducing incorrectable corruption into the file system that will within six months cause constant crashes of the metadata node to the point where the file system is unusable? > > In best slashdot car analogy GPFS is like driving round in a Aston Martin DB9, where Lustre is like having a Ford Pinto. You will never be happy with Pinto in my experience having gone from the DB9 to the Pinto and back to the DB9. > > That said if you use Lustre as a high performance scratch file system fro HPC and every ~6 months do a shutdown and upgrade, and at the same time reformat your Lustre file system you will be fine. > > Our experience with Lustre was so bad we specifically excluded it as an option for our current HPC system when it went out to tender. > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From carlz at us.ibm.com Fri Apr 17 14:15:07 2020 From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com) Date: Fri, 17 Apr 2020 13:15:07 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction Message-ID: <82819CD0-0BF7-41A6-9896-32AF88744D4B@us.ibm.com> Dean Flanders: > Thanks for the clarification. I have always heard the term "existing customers" so originally I thought we were fine, > but this is the first time I have seen the term "existing systems". However, it seems what I said before is mostly correct, > eventually all customers will be forced to capacity based licensing as they life cycle hardware (even IBM customers). > In addition it seems there is a diminishing number of OEMs that can sell SS v5, which is what happened in our case when > we wanted to go from v4 to v5 with existing hardware (in our case DDN). So I strongly encourage organizations to be thinking > of these issues in their long term planning. Again, this isn?t quite correct, and I really want the archive of this thread to be completely correct when people review it in the future. As an existing customer of DDN, the problem GridScaler customers in particular are facing is not Sockets vs. Capacity. It is simply that DDN is not an OEM licensee for Scale V5. So DDN cannot upgrade your GridScaler to V5, *neither on Sockets nor on Capacity*. Then if you go to another supplier for V5, you are a new customer to that supplier. (Some of you out there are, I know, multi-sourcing your Scale systems, so may be an ?existing customer? of several Scale suppliers). And again, it is not correct that eventually all customers will be forced to capacity licensing. Those of you on Scale Standard and Scale Advanced software, which are not tied to specific systems or hardware, can continue on those licenses. There is no plan to require those people to migrate. By contrast, OEM licenses (and ESS licenses) were always sold as part of a system and attached to that system -- one of the things that makes those licenses cheaper than software licenses that live forever and float from system to system. It is also not true that there is a ?diminishing number of OEMs? selling V5. Everybody that sold V4 has added V5 to their contract, as far as I am aware -- except DDN. And we have added a number of additional OEMs in the past couple of years (some of them quite invisibly as Scale is embedded deep in their solution and they want their own brand front and center) and a couple more big names are in development that I can?t mention until they are ready to announce themselves. We also have a more diverse OEM model: as well as storage vendors that include Scale in a storage solution, we have various embedded vertical solutions, backup solutions, and cloud-based service offerings using Scale. Even Dell is selling a Scale solution now via our OEM Arcastream. Again, DDN and IBM are working together to find a path forward for GridScaler owners to get past this problem, and once again I ask for your patience as we get the details right. Regards Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com [signature_50537] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 69558 bytes Desc: image001.png URL: From steve.hindmarsh at crick.ac.uk Fri Apr 17 14:33:10 2020 From: steve.hindmarsh at crick.ac.uk (Steve Hindmarsh) Date: Fri, 17 Apr 2020 13:33:10 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction In-Reply-To: <48C781AA-BF81-4E8B-A290-C55A0C322DD4@us.ibm.com> References: <48C781AA-BF81-4E8B-A290-C55A0C322DD4@us.ibm.com> Message-ID: Hi Carl, Thanks for the update which is very encouraging. I?m happy to sit tight and wait for an announcement. Best, Steve Steve Hindmarsh Head of Scientific Computing The Francis Crick Institute ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Carl Zetie - carlz at us.ibm.com Sent: Friday, April 17, 2020 1:16:38 PM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Spectrum Scale licensing - important correction Rob Horton wrote: >I'm not sure what the issue is between DDN and IBM (although I've heard various rumors) >but I really wish they would sort something out. Yes, it?s a pain. IBM and DDN are trying very hard to work something out, but it?s hard to get all the ?I?s dotted and ?T?s crossed with our respective legal and exec reviewers so that when we do say something it will be complete, clear, and final; and not require long, baroque threads for people to figure out where exactly they are? I wish I could say more, but I need to respect the confidentiality of the relationship and the live discussion. In the meantime, I thank you for your patience, and ask that you not believe any rumors you might hear, because whatever they are, they are wrong (or at least incomplete). In this situation, as a wise man once observed, ?those who Say don?t Know; those who Know don?t Say?. Regards, Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com [signature_749317756] The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 69558 bytes Desc: image001.png URL: From jonathan.buzzard at strath.ac.uk Fri Apr 17 14:44:29 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 17 Apr 2020 14:44:29 +0100 Subject: [gpfsug-discuss] Spectrum Scale licensing In-Reply-To: References: Message-ID: <52374047-db40-9b99-1f29-a5abdab146f3@strath.ac.uk> On 17/04/2020 14:15, Aaron Knister wrote: > Yeah, I had similar experiences in the past (over a decade ago) with > Lustre and was heavily heavily anti-Lustre. That said, I just > finished several weeks of what I?d call grueling testing of DDN > Lustre and GPFS on the same hardware and I?m reasonably convinced > much of that is behind us now (things like stability, metadata > performance, random I/O performance just don?t appear to be issues > anymore and in some cases these operations are now faster in Lustre). Several weeks testing frankly does not cut the mustard to demonstrate stability. Our Lustre would run for months on end then boom, metadata server kernel panics. Sometimes but not always this would introduce the incorrectable file system corruption. You are going to need to have several years behind it to claim it is now stable. At this point I would note that basically a fsck on Lustre is not possible. Sure there is a somewhat complicated procedure for it, but firstly it is highly likely to take weeks to run, and even then it might not be able to actually fix the problem. > Full disclosure, I work for DDN, but the source of my paycheck has > relatively little bearing on my technical opinions. All I?m saying is > for me to honestly believe Lustre is worth another shot after the > experiences I had years ago is significant. I do think it?s key to > have a vendor behind you, vs rolling your own. I have seen that make > a difference. I?m happy to take any further conversation/questions > offline, I?m in no way trying to turn this into a marketing > campaign. Lustre is as of two years ago still behind GPFS 3.0 in terms of features and stability in my view. The idea it has caught up to GPFS 5.x in the last two years is in my view errant nonsense, software development just does not work like that. Let me put it another way, in our experience the loss of compute capacity from the downtime of Lustre exceeded the cost of GPFS licenses. That excludes the wage costs of researches twiddling their thumbs whilst the system was restored to working order. If I am being cynical if you can afford DDN storage in the first place stop winging about GPFS license costs. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From christophe.darras at atempo.com Fri Apr 17 15:00:10 2020 From: christophe.darras at atempo.com (Christophe Darras) Date: Fri, 17 Apr 2020 14:00:10 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing In-Reply-To: <52374047-db40-9b99-1f29-a5abdab146f3@strath.ac.uk> References: <52374047-db40-9b99-1f29-a5abdab146f3@strath.ac.uk> Message-ID: Hey Ladies and Gent, For some people here, it seems GPFS is like a religion? A lovely weekend to all of you, Kind Regards, Chris -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jonathan Buzzard Sent: vendredi 17 avril 2020 14:44 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Spectrum Scale licensing On 17/04/2020 14:15, Aaron Knister wrote: > Yeah, I had similar experiences in the past (over a decade ago) with > Lustre and was heavily heavily anti-Lustre. That said, I just finished > several weeks of what I?d call grueling testing of DDN Lustre and GPFS > on the same hardware and I?m reasonably convinced much of that is > behind us now (things like stability, metadata performance, random I/O > performance just don?t appear to be issues anymore and in some cases > these operations are now faster in Lustre). Several weeks testing frankly does not cut the mustard to demonstrate stability. Our Lustre would run for months on end then boom, metadata server kernel panics. Sometimes but not always this would introduce the incorrectable file system corruption. You are going to need to have several years behind it to claim it is now stable. At this point I would note that basically a fsck on Lustre is not possible. Sure there is a somewhat complicated procedure for it, but firstly it is highly likely to take weeks to run, and even then it might not be able to actually fix the problem. > Full disclosure, I work for DDN, but the source of my paycheck has > relatively little bearing on my technical opinions. All I?m saying is > for me to honestly believe Lustre is worth another shot after the > experiences I had years ago is significant. I do think it?s key to > have a vendor behind you, vs rolling your own. I have seen that make a > difference. I?m happy to take any further conversation/questions > offline, I?m in no way trying to turn this into a marketing campaign. Lustre is as of two years ago still behind GPFS 3.0 in terms of features and stability in my view. The idea it has caught up to GPFS 5.x in the last two years is in my view errant nonsense, software development just does not work like that. Let me put it another way, in our experience the loss of compute capacity from the downtime of Lustre exceeded the cost of GPFS licenses. That excludes the wage costs of researches twiddling their thumbs whilst the system was restored to working order. If I am being cynical if you can afford DDN storage in the first place stop winging about GPFS license costs. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From yeep at robust.my Fri Apr 17 15:01:05 2020 From: yeep at robust.my (T.A. Yeep) Date: Fri, 17 Apr 2020 22:01:05 +0800 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction In-Reply-To: References: Message-ID: Hi JAB, Sound interesting, however, I'm actually a newcomer to Scale, I wish I could share the joy of mixing that. I guess maybe it is something similar to LSF RVU/UVUs? Thanks for sharing your experience anyway. Hi Carl, I just want to let you know that I have got your explanation, and I understand it now. Thanks. Not sure If I should always reply a "thank you" or "I've got it" in the mailing list, or better just do it privately. Same I'm new to mailing list too, so please let me know if I should not reply it publicly. On Fri, Apr 17, 2020 at 6:50 PM Jonathan Buzzard < jonathan.buzzard at strath.ac.uk> wrote: > On 17/04/2020 11:31, T.A. Yeep wrote: > > Hi Carl, > > > > I'm confused here, in the previous email it was said *And for ESS, it is > > licensed Per Drive with different prices for HDDs and SSDs.* > > > > But then you mentioned in below email that: > > But new customers and new OEM systems are *all licensed by Capacity. > > This also applies to IBM's own ESS*: you can keep upgrading your old (if > > hardware is supported) gen 1 ESS on Sockets, but if you replace it with > > *a new ESS, that will come with capacity licenses*. > > > > Now the question, ESS is license per Drive or by capacity? > > > > Well by drive is "capacity" based licensing unless you have some sort of > magical infinite capacity drives :-) > > Under the PVU scheme if you know what you are doing you could game the > system. For example get a handful of servers get PVU licenses for them > create a GPFS file system handing off the back using say Fibre Channel > and cheap FC attached arrays (Dell MD3000 series springs to mind) and > then hang many PB off the back. I could using this scheme create a 100PB > filesystem for under a thousand PVU of GPFS server licenses. Add in > another cluster for protocol nodes and if you are not mounting on HPC > nodes that's a winner :-) > > In a similar manner I use a pimped out ancient Dell R300 with dual core > Xeon for backing up my GPFS filesystem because it's 100PVU of TSM > licensing and I am cheap, and besides it is more than enough grunt for > the job. A new machine would be 240 PVU minimum (4*70). I plan on > replacing the PERC SAS6 card with a H710 and new internal cabling to run > RHEL8 :-) > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Best regards *T.A. Yeep*Mobile: 016-719 8506 | Tel/Fax: 03-6261 7237 | www.robusthpc.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ulmer at ulmer.org Sat Apr 18 16:04:53 2020 From: ulmer at ulmer.org (Stephen Ulmer) Date: Sat, 18 Apr 2020 11:04:53 -0400 Subject: [gpfsug-discuss] gpfs filesets question In-Reply-To: References: Message-ID: Is this still true if the source and target fileset are both in the same storage pool? It seems like they could just move the metadata? Especially in the case of dependent filesets where the metadata is actually in the same allocation area for both the source and target. Maybe this just doesn?t happen often enough to optimize? -- Stephen > On Apr 16, 2020, at 12:50 PM, Oesterlin, Robert wrote: > > Moving data between filesets is like moving files between file systems. Normally when you move files between directories, it?s simple metadata, but with filesets (dependent or independent) is a full copy and delete of the old data. > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > > From: > on behalf of "J. Eric Wonderley" > > Reply-To: gpfsug main discussion list > > Date: Thursday, April 16, 2020 at 11:32 AM > To: gpfsug main discussion list > > Subject: [EXTERNAL] [gpfsug-discuss] gpfs filesets question > > I have filesets setup in a filesystem...looks like: > [root at cl005 ~]# mmlsfileset home -L > Filesets in file system 'home': > Name Id RootInode ParentId Created InodeSpace MaxInodes AllocInodes Comment > root 0 3 -- Tue Jun 30 07:54:09 2015 0 402653184 320946176 root fileset > hess 1 543733376 0 Tue Jun 13 14:56:13 2017 0 0 0 > predictHPC 2 1171116 0 Thu Jan 5 15:16:56 2017 0 0 0 > HYCCSIM 3 544258049 0 Wed Jun 14 10:00:41 2017 0 0 0 > socialdet 4 544258050 0 Wed Jun 14 10:01:02 2017 0 0 0 > arc 5 1171073 0 Thu Jan 5 15:07:09 2017 0 0 0 > arcadm 6 1171074 0 Thu Jan 5 15:07:10 2017 0 0 0 > > I beleive these are dependent filesets. Dependent on the root fileset. Anyhow a user wants to move a large amount of data from one fileset to another. Would this be a metadata only operation? He has attempted to small amount of data and has noticed some thrasing. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From st.graf at fz-juelich.de Mon Apr 20 09:29:17 2020 From: st.graf at fz-juelich.de (Stephan Graf) Date: Mon, 20 Apr 2020 10:29:17 +0200 Subject: [gpfsug-discuss] gpfs filesets question In-Reply-To: References: Message-ID: Hi, we recognized this behavior when we tried to move HSM migrated files between filesets. This cases a recall. Very annoying when the data are afterword stored on the same pools and have to be migrated back to tape. @IBM: should we open a RFE to address this? Stephan Am 18.04.2020 um 17:04 schrieb Stephen Ulmer: > Is this still true if the source and target fileset are both in the same > storage pool? It seems like they could just move the metadata? > Especially in the case of dependent filesets where the metadata is > actually in the same allocation area for both the source and target. > > Maybe this just doesn?t happen often enough to optimize? > > -- > Stephen > > > >> On Apr 16, 2020, at 12:50 PM, Oesterlin, Robert >> > wrote: >> >> Moving data between filesets is like moving files between file >> systems. Normally when you move files between directories, it?s simple >> metadata, but with filesets (dependent or independent) is a full copy >> and delete of the old data. >> Bob Oesterlin >> Sr Principal Storage Engineer, Nuance >> *From:*> > on behalf of "J. >> Eric Wonderley" > >> *Reply-To:*gpfsug main discussion list >> > > >> *Date:*Thursday, April 16, 2020 at 11:32 AM >> *To:*gpfsug main discussion list > > >> *Subject:*[EXTERNAL] [gpfsug-discuss] gpfs filesets question >> I have filesets setup in a filesystem...looks like: >> [root at cl005 ~]# mmlsfileset home -L >> Filesets in file system 'home': >> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ?Id ? ? ?RootInode ?ParentId Created >> ? ? ? ? ? ? ? ? ? ?InodeSpace ? ? ?MaxInodes ? ?AllocInodes Comment >> root ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? ? ? ? ? ?3 ? ? ? ?-- Tue Jun 30 >> 07:54:09 2015 ? ? ? ?0 ? ? ? ? ? ?402653184 ? ? ?320946176 root fileset >> hess ? ? ? ? ? ? ? ? ? ? ? ? ? ? 1 ? ? ?543733376 ? ? ? ? 0 Tue Jun 13 >> 14:56:13 2017 ? ? ? ?0 ? ? ? ? ? ? ? ? ? ?0 ? ? ? ? ? ? ?0 >> predictHPC ? ? ? ? ? ? ? ? ? ? ? 2 ? ? ? ?1171116 ? ? ? ? 0 Thu Jan ?5 >> 15:16:56 2017 ? ? ? ?0 ? ? ? ? ? ? ? ? ? ?0 ? ? ? ? ? ? ?0 >> HYCCSIM ? ? ? ? ? ? ? ? ? ? ? ? ?3 ? ? ?544258049 ? ? ? ? 0 Wed Jun 14 >> 10:00:41 2017 ? ? ? ?0 ? ? ? ? ? ? ? ? ? ?0 ? ? ? ? ? ? ?0 >> socialdet ? ? ? ? ? ? ? ? ? ? ? ?4 ? ? ?544258050 ? ? ? ? 0 Wed Jun 14 >> 10:01:02 2017 ? ? ? ?0 ? ? ? ? ? ? ? ? ? ?0 ? ? ? ? ? ? ?0 >> arc ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?5 ? ? ? ?1171073 ? ? ? ? 0 Thu Jan ?5 >> 15:07:09 2017 ? ? ? ?0 ? ? ? ? ? ? ? ? ? ?0 ? ? ? ? ? ? ?0 >> arcadm ? ? ? ? ? ? ? ? ? ? ? ? ? 6 ? ? ? ?1171074 ? ? ? ? 0 Thu Jan ?5 >> 15:07:10 2017 ? ? ? ?0 ? ? ? ? ? ? ? ? ? ?0 ? ? ? ? ? ? ?0 >> I beleive these are dependent filesets.? Dependent on the root >> fileset.? ?Anyhow a user wants to move a large amount of data from one >> fileset to another.? ?Would this be a metadata only operation?? He has >> attempted to small amount of data and has noticed some thrasing. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss atspectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5360 bytes Desc: S/MIME Cryptographic Signature URL: From olaf.weiser at de.ibm.com Mon Apr 20 11:54:06 2020 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Mon, 20 Apr 2020 10:54:06 +0000 Subject: [gpfsug-discuss] gpfs filesets question In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: From skariapaul at gmail.com Wed Apr 22 04:40:28 2020 From: skariapaul at gmail.com (PS K) Date: Wed, 22 Apr 2020 11:40:28 +0800 Subject: [gpfsug-discuss] S3, S3A & S3n support Message-ID: Hi, Does SS object protocol support S3a and S3n? Regards Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Wed Apr 22 09:19:10 2020 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Wed, 22 Apr 2020 04:19:10 -0400 Subject: [gpfsug-discuss] GPFS vulnerability with possible root exploit on versions prior to 5.0.4.3 (and 4.2.3.21) Message-ID: <4b396469-b1c7-69bc-157c-23bbe62d847a@scinet.utoronto.ca> In case you missed (the forum has been pretty quiet about this one), CVE-2020-4273 had an update yesterday: https://www.ibm.com/support/pages/node/6151701?myns=s033&mynp=OCSTXKQY&mync=E&cm_sp=s033-_-OCSTXKQY-_-E If you can't do the upgrade now, at least apply the mitigation to the client nodes generally exposed to unprivileged users: Check the setuid bit: ls -l /usr/lpp/mmfs/bin | grep r-s | awk '{system("ls -l /usr/lpp/mmfs/bin/"$9)}') Apply the mitigation: ls -l /usr/lpp/mmfs/bin | grep r-s | awk '{system("chmod u-s /usr/lpp/mmfs/bin/"$9)}' Verification: ls -l /usr/lpp/mmfs/bin | grep r-s | awk '{system("ls -l /usr/lpp/mmfs/bin/"$9)}') All the best Jaime . . . ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 From st.graf at fz-juelich.de Wed Apr 22 10:02:59 2020 From: st.graf at fz-juelich.de (Stephan Graf) Date: Wed, 22 Apr 2020 11:02:59 +0200 Subject: [gpfsug-discuss] GPFS vulnerability with possible root exploit on versions prior to 5.0.4.3 (and 4.2.3.21) In-Reply-To: <4b396469-b1c7-69bc-157c-23bbe62d847a@scinet.utoronto.ca> References: <4b396469-b1c7-69bc-157c-23bbe62d847a@scinet.utoronto.ca> Message-ID: Hi I took a lookat the "Readme and Release notes for release 5.0.4.3 IBM Spectrum Scale 5.0.4.3 Spectrum_Scale_Data_Management-5.0.4.3-x86_64-Linux Readme" But I did not find the entry which mentioned the "For IBM Spectrum Scale V5.0.0.0 through V5.0.4.1, reference APAR IJ23438" APAR number which is mentioned on the "Security Bulletin: A vulnerability has been identified in IBM Spectrum Scale where an unprivileged user could execute commands as root ( CVE-2020-4273)" page. shouldn't it be mentioned there? Stephan Am 22.04.2020 um 10:19 schrieb Jaime Pinto: > In case you missed (the forum has been pretty quiet about this one), > CVE-2020-4273 had an update yesterday: > > https://www.ibm.com/support/pages/node/6151701?myns=s033&mynp=OCSTXKQY&mync=E&cm_sp=s033-_-OCSTXKQY-_-E > > > If you can't do the upgrade now, at least apply the mitigation to the > client nodes generally exposed to unprivileged users: > > Check the setuid bit: > ls -l /usr/lpp/mmfs/bin | grep r-s | awk '{system("ls -l > /usr/lpp/mmfs/bin/"$9)}') > > Apply the mitigation: > ls -l /usr/lpp/mmfs/bin | grep r-s | awk '{system("chmod u-s > /usr/lpp/mmfs/bin/"$9)}' > > Verification: > ls -l /usr/lpp/mmfs/bin | grep r-s | awk '{system("ls -l > /usr/lpp/mmfs/bin/"$9)}') > > All the best > Jaime > > . > . > .??????? ************************************ > ????????? TELL US ABOUT YOUR SUCCESS STORIES > ???????? http://www.scinethpc.ca/testimonials > ???????? ************************************ > --- > Jaime Pinto - Storage Analyst > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5360 bytes Desc: S/MIME Cryptographic Signature URL: From knop at us.ibm.com Wed Apr 22 16:42:54 2020 From: knop at us.ibm.com (Felipe Knop) Date: Wed, 22 Apr 2020 15:42:54 +0000 Subject: [gpfsug-discuss] GPFS vulnerability with possible root exploit on versions prior to 5.0.4.3 (and 4.2.3.21) In-Reply-To: References: , <4b396469-b1c7-69bc-157c-23bbe62d847a@scinet.utoronto.ca> Message-ID: An HTML attachment was scrubbed... URL: From thakur.hpc at gmail.com Wed Apr 22 19:23:53 2020 From: thakur.hpc at gmail.com (Bhupender thakur) Date: Wed, 22 Apr 2020 11:23:53 -0700 Subject: [gpfsug-discuss] GPFS vulnerability with possible root exploit on versions prior to 5.0.4.3 (and 4.2.3.21) In-Reply-To: References: <4b396469-b1c7-69bc-157c-23bbe62d847a@scinet.utoronto.ca> Message-ID: Has IBM released or does IBM plan to release a fix in the 5.0.3.x branch? On Wed, Apr 22, 2020 at 8:45 AM Felipe Knop wrote: > Stephan, > > Security bulletins need to go through an internal process, including legal > review. In addition, we are normally required to ensure the fix is > available for all releases before the security bulletin can be published. > Because of that, we normally don't list details for security fixes in > either the readmes or APARs, since the information can only be disclosed in > the bulletin itself. > > ---- > The bulletin below has: > > If you cannot apply the latest level of service, contact IBM Service for > an efix: > > - For IBM Spectrum Scale V5.0.0.0 through V5.0.4.1, reference APAR IJ23438 > > - For IBM Spectrum Scale V4.2.0.0 through V4.2.3.20, reference APAR > IJ23426 > "V5.0.0.0 through V5.0.4.1" should have been "V5.0.0.0 through V5.0.4.2". > (I have asked the text to be corrected) > > > > Felipe > > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > > ----- Original message ----- > From: Stephan Graf > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: > Cc: > Subject: [EXTERNAL] Re: [gpfsug-discuss] GPFS vulnerability with possible > root exploit on versions prior to 5.0.4.3 (and 4.2.3.21) > Date: Wed, Apr 22, 2020 5:04 AM > > Hi > > I took a lookat the "Readme and Release notes for release 5.0.4.3 IBM > Spectrum Scale 5.0.4.3 > Spectrum_Scale_Data_Management-5.0.4.3-x86_64-Linux Readme" > But I did not find the entry which mentioned the "For IBM Spectrum Scale > V5.0.0.0 through V5.0.4.1, reference APAR IJ23438" APAR number which is > mentioned on the "Security Bulletin: A vulnerability has been identified > in IBM Spectrum Scale where an unprivileged user could execute commands > as root ( CVE-2020-4273)" page. > > shouldn't it be mentioned there? > > Stephan > > > Am 22.04.2020 um 10:19 schrieb Jaime Pinto: > > In case you missed (the forum has been pretty quiet about this one), > > CVE-2020-4273 had an update yesterday: > > > > > https://www.ibm.com/support/pages/node/6151701?myns=s033&mynp=OCSTXKQY&mync=E&cm_sp=s033-_-OCSTXKQY-_-E > > > > > > > If you can't do the upgrade now, at least apply the mitigation to the > > client nodes generally exposed to unprivileged users: > > > > Check the setuid bit: > > ls -l /usr/lpp/mmfs/bin | grep r-s | awk '{system("ls -l > > /usr/lpp/mmfs/bin/"$9)}') > > > > Apply the mitigation: > > ls -l /usr/lpp/mmfs/bin | grep r-s | awk '{system("chmod u-s > > /usr/lpp/mmfs/bin/"$9)}' > > > > Verification: > > ls -l /usr/lpp/mmfs/bin | grep r-s | awk '{system("ls -l > > /usr/lpp/mmfs/bin/"$9)}') > > > > All the best > > Jaime > > > > . > > . > > . ************************************ > > TELL US ABOUT YOUR SUCCESS STORIES > > http://www.scinethpc.ca/testimonials > > ************************************ > > --- > > Jaime Pinto - Storage Analyst > > SciNet HPC Consortium - Compute/Calcul Canada > > www.scinet.utoronto.ca - www.computecanada.ca > > University of Toronto > > 661 University Ave. (MaRS), Suite 1140 > > Toronto, ON, M5G1M1 > > P: 416-978-2755 > > C: 416-505-1477 > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Wed Apr 22 21:05:49 2020 From: knop at us.ibm.com (Felipe Knop) Date: Wed, 22 Apr 2020 20:05:49 +0000 Subject: [gpfsug-discuss] GPFS vulnerability with possible root exploit on versions prior to 5.0.4.3 (and 4.2.3.21) In-Reply-To: References: , <4b396469-b1c7-69bc-157c-23bbe62d847a@scinet.utoronto.ca> Message-ID: An HTML attachment was scrubbed... URL: From thakur.hpc at gmail.com Wed Apr 22 21:47:30 2020 From: thakur.hpc at gmail.com (Bhupender thakur) Date: Wed, 22 Apr 2020 13:47:30 -0700 Subject: [gpfsug-discuss] GPFS vulnerability with possible root exploit on versions prior to 5.0.4.3 (and 4.2.3.21) In-Reply-To: References: <4b396469-b1c7-69bc-157c-23bbe62d847a@scinet.utoronto.ca> Message-ID: Thanks for the clarification Felipe. On Wed, Apr 22, 2020 at 1:06 PM Felipe Knop wrote: > Bhupender, > > PTFs for the 5.0.3 branch are no longer produced (as is the case for > 5.0.2, 5.0.1, and 5.0.0), but efixes for 5.0.3 can be requested. When > requesting the efix, please indicate the APAR number listed in bulletin > below, as well as the location of the bulletin itself, just in case: > > > https://www.ibm.com/support/pages/node/6151701?myns=s033&mynp=OCSTXKQY&mync=E&cm_sp=s033-_-OCSTXKQY-_-E > > Felipe > > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > > ----- Original message ----- > From: Bhupender thakur > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: [EXTERNAL] Re: [gpfsug-discuss] GPFS vulnerability with possible > root exploit on versions prior to 5.0.4.3 (and 4.2.3.21) > Date: Wed, Apr 22, 2020 2:24 PM > > Has IBM released or does IBM plan to release a fix in the 5.0.3.x branch? > > On Wed, Apr 22, 2020 at 8:45 AM Felipe Knop wrote: > > Stephan, > > Security bulletins need to go through an internal process, including legal > review. In addition, we are normally required to ensure the fix is > available for all releases before the security bulletin can be published. > Because of that, we normally don't list details for security fixes in > either the readmes or APARs, since the information can only be disclosed in > the bulletin itself. > > ---- > The bulletin below has: > > If you cannot apply the latest level of service, contact IBM Service for > an efix: > > - For IBM Spectrum Scale V5.0.0.0 through V5.0.4.1, reference APAR IJ23438 > > - For IBM Spectrum Scale V4.2.0.0 through V4.2.3.20, reference APAR > IJ23426 > "V5.0.0.0 through V5.0.4.1" should have been "V5.0.0.0 through V5.0.4.2". > (I have asked the text to be corrected) > > > > Felipe > > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > > ----- Original message ----- > From: Stephan Graf > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: > Cc: > Subject: [EXTERNAL] Re: [gpfsug-discuss] GPFS vulnerability with possible > root exploit on versions prior to 5.0.4.3 (and 4.2.3.21) > Date: Wed, Apr 22, 2020 5:04 AM > > Hi > > I took a lookat the "Readme and Release notes for release 5.0.4.3 IBM > Spectrum Scale 5.0.4.3 > Spectrum_Scale_Data_Management-5.0.4.3-x86_64-Linux Readme" > But I did not find the entry which mentioned the "For IBM Spectrum Scale > V5.0.0.0 through V5.0.4.1, reference APAR IJ23438" APAR number which is > mentioned on the "Security Bulletin: A vulnerability has been identified > in IBM Spectrum Scale where an unprivileged user could execute commands > as root ( CVE-2020-4273)" page. > > shouldn't it be mentioned there? > > Stephan > > > Am 22.04.2020 um 10:19 schrieb Jaime Pinto: > > In case you missed (the forum has been pretty quiet about this one), > > CVE-2020-4273 had an update yesterday: > > > > > https://www.ibm.com/support/pages/node/6151701?myns=s033&mynp=OCSTXKQY&mync=E&cm_sp=s033-_-OCSTXKQY-_-E > > > > > > > If you can't do the upgrade now, at least apply the mitigation to the > > client nodes generally exposed to unprivileged users: > > > > Check the setuid bit: > > ls -l /usr/lpp/mmfs/bin | grep r-s | awk '{system("ls -l > > /usr/lpp/mmfs/bin/"$9)}') > > > > Apply the mitigation: > > ls -l /usr/lpp/mmfs/bin | grep r-s | awk '{system("chmod u-s > > /usr/lpp/mmfs/bin/"$9)}' > > > > Verification: > > ls -l /usr/lpp/mmfs/bin | grep r-s | awk '{system("ls -l > > /usr/lpp/mmfs/bin/"$9)}') > > > > All the best > > Jaime > > > > . > > . > > . ************************************ > > TELL US ABOUT YOUR SUCCESS STORIES > > http://www.scinethpc.ca/testimonials > > ************************************ > > --- > > Jaime Pinto - Storage Analyst > > SciNet HPC Consortium - Compute/Calcul Canada > > www.scinet.utoronto.ca - www.computecanada.ca > > University of Toronto > > 661 University Ave. (MaRS), Suite 1140 > > Toronto, ON, M5G1M1 > > P: 416-978-2755 > > C: 416-505-1477 > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Wed Apr 22 23:34:33 2020 From: bbanister at jumptrading.com (Bryan Banister) Date: Wed, 22 Apr 2020 22:34:33 +0000 Subject: [gpfsug-discuss] Is there a difference in suspend and empty NSD state? Message-ID: Hello all, Looking at the man page, it is fairly ambiguous as to these NSD states actually being different (and if not WHY have to names for the same thing?!): suspend or empty Instructs GPFS to stop allocating space on the specified disk. Put a disk in this state when you are preparing to remove the file system data from the disk or if you want to prevent new data from being put on the disk. This is a user-initiated state that GPFS never enters without an explicit command to change the disk state. Existing data on a suspended disk may still be read or updated. A disk remains in a suspended or to be emptied state until it is explicitly resumed. Restarting GPFS or rebooting nodes does not restore normal access to a suspended disk. And from the examples lower in the page: Note: In product versions earlier than V4.1.1, the mmlsdisk command lists the disk status as suspended. In product versions V4.1.1 and later, the mmlsdisk command lists the disk status as to be emptied with both mmchdisk suspend or mmchdisk empty commands. And really what I currently want to do is suspend a set of disks, and then mark a different set of disks as "to be emptied". Then I will run a mmrestripefs operation to move the data off of the "to be emptied" disks, but not onto the suspended disks (which will also be removed from the file system in the near future). Once the NSDs are emptied then it will be a very (relatively) fast mmdeldisk operation. So is that possible? As you can likely tell, I don't have enough space to just delete both sets of disks at once during a (yay!) full file system migration to the new GPFS 5.x version. Thought this might be useful to others, so posted here. Thanks in advance neighbors! -Bryan -------------- next part -------------- An HTML attachment was scrubbed... URL: From brnelson at us.ibm.com Thu Apr 23 00:49:13 2020 From: brnelson at us.ibm.com (Brian Nelson) Date: Wed, 22 Apr 2020 18:49:13 -0500 Subject: [gpfsug-discuss] S3, S3A & S3n support In-Reply-To: References: Message-ID: The Spectrum Scale Object protocol only has support for the traditional S3 object storage. -Brian =================================== Brian Nelson IBM Spectrum Scale brnelson at us.ibm.com ----- Original message ----- From: PS K Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Cc: Subject: [EXTERNAL] [gpfsug-discuss] S3, S3A & S3n support Date: Wed, Apr 22, 2020 12:03 AM Hi, Does SS object protocol support S3a and S3n? Regards Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Thu Apr 23 11:33:34 2020 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Thu, 23 Apr 2020 18:33:34 +0800 Subject: [gpfsug-discuss] =?utf-8?q?Is_there_a_difference_in_suspend_and_e?= =?utf-8?q?mpty_NSD=09state=3F?= In-Reply-To: References: Message-ID: Option 'suspend' is same to 'empty' if the cluster is updated to Scale 4.1.1. The option 'empty' was introduced in 4.1.1 to support disk deletion in a fast way, 'suspend' option was not removed with due consideration for previous users. > And really what I currently want to do is suspend a set of disks, > and then mark a different set of disks as ?to be emptied?. Then I > will run a mmrestripefs operation to move the data off of the ?to be > emptied? disks, but not onto the suspended disks (which will also be > removed from the file system in the near future). Once the NSDs are > emptied then it will be a very (relatively) fast mmdeldisk > operation. So is that possible? It's possible only if these two sets of disks belong to two different pools . If they are in the same pool, restripefs on the pool will migrate all data off these two sets of disks. If they are in two different pools, you can use mmrestripefs with -P option to migrate data off "suspended" and "to be emptied" disks in the specified data pool. Please note that system pool is special, mmrestripefs will unconditionally restripe the system pool even you specified -P option to a data pool. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. gpfsug-discuss-bounces at spectrumscale.org wrote on 2020/04/23 06:34:33: > From: Bryan Banister > To: gpfsug main discussion list > Date: 2020/04/23 06:35 > Subject: [EXTERNAL] [gpfsug-discuss] Is there a difference in > suspend and empty NSD state? > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > Hello all, > > Looking at the man page, it is fairly ambiguous as to these NSD > states actually being different (and if not WHY have to names for > the same thing?!): > > suspend > or > empty > Instructs GPFS to stop allocating space on the specified > disk. Put a disk in this state when you are preparing to > remove the file system data from the disk or if you want > to prevent new data from being put on the disk. This is > a user-initiated state that GPFS never enters without an > explicit command to change the disk state. Existing data > on a suspended disk may still be read or updated. > > A disk remains in a suspended or to be > emptied state until it is explicitly resumed. > Restarting GPFS or rebooting nodes does not restore > normal access to a suspended disk. > > And from the examples lower in the page: > Note: In product versions earlier than V4.1.1, the > mmlsdisk command lists the disk status as > suspended. In product versions V4.1.1 and later, the > mmlsdisk command lists the disk status as to be > emptied with both mmchdisk suspend or mmchdisk > empty commands. > > > And really what I currently want to do is suspend a set of disks, > and then mark a different set of disks as ?to be emptied?. Then I > will run a mmrestripefs operation to move the data off of the ?to be > emptied? disks, but not onto the suspended disks (which will also be > removed from the file system in the near future). Once the NSDs are > emptied then it will be a very (relatively) fast mmdeldisk > operation. So is that possible? > > As you can likely tell, I don?t have enough space to just delete > both sets of disks at once during a (yay!) full file system > migration to the new GPFS 5.x version. > > Thought this might be useful to others, so posted here. Thanks in > advance neighbors! > -Bryan_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url? > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=QxEYrybXOI6xpUEVxZumWQYDMDbDLx4O4vrm0PNotMw&s=4M2- > uNMOrvL7kEQu_UmL5VvnkKfPL-EpSapVGkSX1jc&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From carlz at us.ibm.com Thu Apr 23 13:55:43 2020 From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com) Date: Thu, 23 Apr 2020 12:55:43 +0000 Subject: [gpfsug-discuss] S3, S3A & S3n support Message-ID: <4BD10FED-F735-4E9D-A04A-7D5C1AD7C598@us.ibm.com> From PS K: >Does SS object protocol support S3a and S3n? Can you share some more details of your requirements, use case, etc., either here on the list or privately with me? We?re currently looking at the strategic direction of our S3 support. As Brian said, today it?s strictly the ?traditional? S3 protocol, but we are evaluating where to go next. Thanks, Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com [signature_219535040] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 69558 bytes Desc: image001.png URL: From skariapaul at gmail.com Fri Apr 24 09:24:53 2020 From: skariapaul at gmail.com (PS K) Date: Fri, 24 Apr 2020 16:24:53 +0800 Subject: [gpfsug-discuss] S3, S3A & S3n support In-Reply-To: <4BD10FED-F735-4E9D-A04A-7D5C1AD7C598@us.ibm.com> References: <4BD10FED-F735-4E9D-A04A-7D5C1AD7C598@us.ibm.com> Message-ID: This is for spark integration which supports only s3a. Cheers On Thu, Apr 23, 2020 at 8:55 PM Carl Zetie - carlz at us.ibm.com < carlz at us.ibm.com> wrote: > From PS K: > > >Does SS object protocol support S3a and S3n? > > > > Can you share some more details of your requirements, use case, etc., > either here on the list or privately with me? > > > > We?re currently looking at the strategic direction of our S3 support. As > Brian said, today it?s strictly the ?traditional? S3 protocol, but we are > evaluating where to go next. > > > > Thanks, > > > > Carl Zetie > > Program Director > > Offering Management > > Spectrum Scale > > ---- > > (919) 473 3318 ][ Research Triangle Park > > carlz at us.ibm.com > > [image: signature_219535040] > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 69558 bytes Desc: not available URL: From TROPPENS at de.ibm.com Mon Apr 27 10:28:59 2020 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Mon, 27 Apr 2020 09:28:59 +0000 Subject: [gpfsug-discuss] Chart decks of German User Meeting are now available Message-ID: An HTML attachment was scrubbed... URL: From andi at christiansen.xxx Tue Apr 28 07:34:37 2020 From: andi at christiansen.xxx (Andi Christiansen) Date: Tue, 28 Apr 2020 08:34:37 +0200 (CEST) Subject: [gpfsug-discuss] Tuning Spectrum Scale AFM for stability? Message-ID: <239358449.52194.1588055677577@privateemail.com> Hi All, Can anyone share some thoughts on how to tune AFM for stability? at the moment we have ok performance between our sites (5-8Gbits with 34ms latency) but we encounter a lock down of the cache fileset from week to week, which was day to day before we tuned below settings.. is there any way to tune AFM further i haven't found ? Cache Site only: TCP Settings: sunrpc.tcp_slot_table_entries = 128 Home and Cache: AFM / GPFS Settings: maxBufferDescs=163840 afmHardMemThreshold=25G afmMaxWriteMergeLen=30G Cache fileset: Attributes for fileset AFMFILESET: ================================ Status Linked Path /mnt/fs02/AFMFILESET Id 1 Root inode 524291 Parent Id 0 Created Tue Apr 14 15:57:43 2020 Comment Inode space 1 Maximum number of inodes 10000384 Allocated inodes 10000384 Permission change flag chmodAndSetacl afm-associated Yes Target nfs://DK_VPN/mnt/fs01/AFMFILESET Mode single-writer File Lookup Refresh Interval 30 (default) File Open Refresh Interval 30 (default) Dir Lookup Refresh Interval 60 (default) Dir Open Refresh Interval 60 (default) Async Delay 15 (default) Last pSnapId 0 Display Home Snapshots no Number of Read Threads per Gateway 64 Parallel Read Chunk Size 128 Parallel Read Threshold 1024 Number of Gateway Flush Threads 48 Prefetch Threshold 0 (default) Eviction Enabled yes (default) Parallel Write Threshold 1024 Parallel Write Chunk Size 128 Number of Write Threads per Gateway 16 IO Flags 0 (default) mmfsadm dump afm: AFM Gateway: RpcQLen: 0 maxPoolSize: 4294967295 QOF: 0 MaxOF: 131072 readThLimit 128 minIOBuf 1048576 maxIOBuf 1073741824 msgMaxWriteSize 2147483648 readBypassThresh 67108864 QLen: 0 QMem: 0 SoftQMem: 10737418240 HardQMem 26843545600 Ping thread: Started Fileset: AFMFILESET 1 (fs02) mode: single-writer queue: Normal MDS: QMem 0 CTL 577 home: DK_VPN homeServer: 10.110.5.11 proto: nfs port: 2049 lastCmd: 16 handler: Mounted Dirty refCount: 1 queueTransfer: state: Idle senderVerified: 0 receiverVerified: 1 terminate: 0 psnapWait: 0 remoteAttrs: AsyncLookups 0 tsfindinode: success 0 failed 0 totalTime 0.0 avgTime 0,000000 maxTime 0.0 queue: delay 15 QLen 0+0 flushThds 0 maxFlushThds 48 numExec 8772518 qfs 0 iwo 0 err 78 handlerCreateTime : 2020-04-27_11:14:57.415+0200 numCreateSnaps : 0 InflightAsyncLookups 0 lastReplayTime : 2020-04-28_07:22:32.415+0200 lastSyncTime : 2020-04-27_15:09:57.415+0200 i/o: readBuf: 33554432 writeBuf: 2097152 sparseReadThresh: 134217728 pReadThreads 64 i/o: pReadChunkSize 33554432 pReadThresh: 1073741824 pWriteThresh: 1073741824 i/o: prefetchThresh 0 (Prefetch) Mnt status: 0:0 1:0 2:0 3:0 Export Map: 10.110.5.10/ 10.110.5.11/ 10.110.5.12/ 10.110.5.13/ Priority Queue: Empty (state: Active) Normal Queue: Empty (state: Active) Cluster Config Cache: maxFilesToCache 131072 maxStatCache 524288 afmDIO 2 afmIOFlags 4096 maxReceiverThreads 32 afmNumReadThreads 64 afmNumWriteThreads 8 afmHardMemThreshold 26843545600 maxBufferDescs 163840 afmMaxWriteMergeLen 32212254720 workerThreads 1024 The entries in the gpfs log states "AFM: Home is taking longer to respond..." but its only AFM and the Cache AFM fileset which enteres a locked state. we have the same NFS exports from home mounted on the same gateway nodes to check when a file is transferred and they are all ok while the AFM lock is happening. a simple gpfs restart of the AFM Master node is enough to make AFM restart and continue for another week.. The home target is exported through CES NFS from 4 CES nodes and a map is created at the Cache site to utilize the ParallelWrites feature. If there is anyone sitting around with some ideas/knowledge on how to tune this further for more stability then i would be happy if you could share your thoughts about it! :-) Many Thanks in Advance! Andi Christiansen -------------- next part -------------- An HTML attachment was scrubbed... URL: From u.sibiller at science-computing.de Tue Apr 28 11:57:48 2020 From: u.sibiller at science-computing.de (Ulrich Sibiller) Date: Tue, 28 Apr 2020 12:57:48 +0200 Subject: [gpfsug-discuss] wait for mount during gpfs startup Message-ID: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de> Hi, when the gpfs systemd service returns from startup the filesystems are usually not mounted. So having another service depending on gpfs is not feasible if you require the filesystem(s). Therefore we have added a script to the systemd gpfs service that waits for all local gpfs filesystems being mounted. We have added that script via ExecStartPost: ------------------------------------------------------------ # cat /etc/systemd/system/gpfs.service.d/waitmount.conf [Service] ExecStartPost=/usr/local/sc-gpfs/sbin/wait-for-all_local-mounts.sh TimeoutStartSec=200 ------------------------------------------------------------- The script itself is not doing much: ------------------------------------------------------------- #!/bin/bash # # wait until all _local_ gpfs filesystems are mounted. It ignored # filesystems where mmlsfs -A does not report "yes". # # returns 0 if all fs are mounted (or none are found in gpfs configuration) # returns non-0 otherwise # wait for max. TIMEOUT seconds TIMEOUT=180 # leading space is required! FS=" $(/usr/lpp/mmfs/bin/mmlsfs all_local -Y 2>/dev/null | grep :automaticMountOption:yes: | cut -d: -f7 | xargs; exit ${PIPESTATUS[0]})" # RC=1 and no output means there are no such filesystems configured in GPFS [ $? -eq 1 ] && [ "$FS" = " " ] && exit 0 # uncomment this line for testing #FS="$FS gpfsdummy" while [ $TIMEOUT -gt 0 ]; do for fs in ${FS}; do if findmnt $fs -n &>/dev/null; then FS=${FS/ $fs/} continue 2; fi done [ -z "${FS// /}" ] && break (( TIMEOUT -= 5 )) sleep 5 done if [ -z "${FS// /}" ]; then exit 0 else echo >&2 "ERROR: filesystem(s) not found in time:${FS}" exit 2 fi -------------------------------------------------- This works without problems on _most_ of our clusters. However, not on all. Some of them show what I believe is a race condition and fail to startup after a reboot: ---------------------------------------------------------------------- # journalctl -u gpfs -- Logs begin at Fri 2020-04-24 17:11:26 CEST, end at Tue 2020-04-28 12:47:34 CEST. -- Apr 24 17:12:13 myhost systemd[1]: Starting General Parallel File System... Apr 24 17:12:17 myhost mmfs[5720]: [X] Cannot open configuration file /var/mmfs/gen/mmfs.cfg. Apr 24 17:13:44 myhost systemd[1]: gpfs.service start-post operation timed out. Stopping. Apr 24 17:13:44 myhost mmremote[8966]: Shutting down! Apr 24 17:13:48 myhost mmremote[8966]: Unloading modules from /lib/modules/3.10.0-1062.18.1.el7.x86_64/extra Apr 24 17:13:48 myhost mmremote[8966]: Unloading module mmfs26 Apr 24 17:13:48 myhost mmremote[8966]: Unloading module mmfslinux Apr 24 17:13:48 myhost systemd[1]: Failed to start General Parallel File System. Apr 24 17:13:48 myhost systemd[1]: Unit gpfs.service entered failed state. Apr 24 17:13:48 myhost systemd[1]: gpfs.service failed. ---------------------------------------------------------------------- The mmfs.log shows a bit more: ---------------------------------------------------------------------- # less /var/adm/ras/mmfs.log.previous 2020-04-24_17:12:14.609+0200: runmmfs starting (4254) 2020-04-24_17:12:14.622+0200: [I] Removing old /var/adm/ras/mmfs.log.* files: 2020-04-24_17:12:14.658+0200: runmmfs: [I] Unloading modules from /lib/modules/3.10.0-1062.18.1.el7.x86_64/extra 2020-04-24_17:12:14.692+0200: runmmfs: [I] Unloading module mmfs26 2020-04-24_17:12:14.901+0200: runmmfs: [I] Unloading module mmfslinux 2020-04-24_17:12:15.018+0200: runmmfs: [I] Unloading module tracedev 2020-04-24_17:12:15.057+0200: runmmfs: [I] Loading modules from /lib/modules/3.10.0-1062.18.1.el7.x86_64/extra Module Size Used by mmfs26 2657452 0 mmfslinux 809734 1 mmfs26 tracedev 48618 2 mmfs26,mmfslinux 2020-04-24_17:12:16.720+0200: Node rebooted. Starting mmautoload... 2020-04-24_17:12:17.011+0200: [I] This node has a valid standard license 2020-04-24_17:12:17.011+0200: [I] Initializing the fast condition variables at 0x5561DFC365C0 ... 2020-04-24_17:12:17.011+0200: [I] mmfsd initializing. {Version: 5.0.4.2 Built: Jan 27 2020 12:13:06} ... 2020-04-24_17:12:17.011+0200: [I] Cleaning old shared memory ... 2020-04-24_17:12:17.012+0200: [I] First pass parsing mmfs.cfg ... 2020-04-24_17:12:17.013+0200: [X] Cannot open configuration file /var/mmfs/gen/mmfs.cfg. 2020-04-24_17:12:20.667+0200: mmautoload: Starting GPFS ... 2020-04-24_17:13:44.846+0200: mmremote: Initiating GPFS shutdown ... 2020-04-24_17:13:47.861+0200: mmremote: Starting the mmsdrserv daemon ... 2020-04-24_17:13:47.955+0200: mmremote: Unloading GPFS kernel modules ... 2020-04-24_17:13:48.165+0200: mmremote: Completing GPFS shutdown ... -------------------------------------------------------------------------- Starting the gpfs service again manually then works without problems. Interestingly the missing mmfs.cfg _is there_ after the shutdown, it gets created shortly after the failure. That's why I am assuming a race condition: -------------------------------------------------------------------------- # stat /var/mmfs/gen/mmfs.cfg File: ?/var/mmfs/gen/mmfs.cfg? Size: 408 Blocks: 8 IO Block: 4096 regular file Device: fd00h/64768d Inode: 268998265 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Context: system_u:object_r:var_t:s0 Access: 2020-04-27 17:12:19.801060073 +0200 Modify: 2020-04-24 17:12:17.617823441 +0200 Change: 2020-04-24 17:12:17.659823405 +0200 Birth: - -------------------------------------------------------------------------- Now, the interesting part: - removing the ExecStartPost script makes the issue vanish. Reboot is always startign gpfs successfully - reducing the ExecStartPost to simply one line ("exit 0") makes the issue stay. gpfs startup always fails. Unfortunately IBM is refusing support because "the script is not coming with gpfs". So I am searching for a solution that makes the script work on those servers again. Or a better way to wait for all local gpfs mounts being ready. Has anyone written something like that already? Thank you, Uli -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 From stockf at us.ibm.com Tue Apr 28 12:30:38 2020 From: stockf at us.ibm.com (Frederick Stock) Date: Tue, 28 Apr 2020 11:30:38 +0000 Subject: [gpfsug-discuss] wait for mount during gpfs startup In-Reply-To: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de> References: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de> Message-ID: An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Tue Apr 28 12:30:38 2020 From: stockf at us.ibm.com (Frederick Stock) Date: Tue, 28 Apr 2020 11:30:38 +0000 Subject: [gpfsug-discuss] wait for mount during gpfs startup In-Reply-To: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de> References: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de> Message-ID: An HTML attachment was scrubbed... URL: From vpuvvada at in.ibm.com Tue Apr 28 12:37:24 2020 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Tue, 28 Apr 2020 17:07:24 +0530 Subject: [gpfsug-discuss] Tuning Spectrum Scale AFM for stability? In-Reply-To: <239358449.52194.1588055677577@privateemail.com> References: <239358449.52194.1588055677577@privateemail.com> Message-ID: Hi, What is lock down of AFM fileset ? Are the messages in requeued state and AFM won't replicate any data ? I would recommend opening a ticket by collecting the logs and internaldump from the gateway node when the replication is stuck. You can also try increasing the value of afmAsyncOpWaitTimeout option and see if this solves the issue. mmchconfig afmAsyncOpWaitTimeout=3600 -i ~Venkat (vpuvvada at in.ibm.com) From: Andi Christiansen To: "gpfsug-discuss at spectrumscale.org" Date: 04/28/2020 12:04 PM Subject: [EXTERNAL] [gpfsug-discuss] Tuning Spectrum Scale AFM for stability? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, Can anyone share some thoughts on how to tune AFM for stability? at the moment we have ok performance between our sites (5-8Gbits with 34ms latency) but we encounter a lock down of the cache fileset from week to week, which was day to day before we tuned below settings.. is there any way to tune AFM further i haven't found ? Cache Site only: TCP Settings: sunrpc.tcp_slot_table_entries = 128 Home and Cache: AFM / GPFS Settings: maxBufferDescs=163840 afmHardMemThreshold=25G afmMaxWriteMergeLen=30G Cache fileset: Attributes for fileset AFMFILESET: ================================ Status Linked Path /mnt/fs02/AFMFILESET Id 1 Root inode 524291 Parent Id 0 Created Tue Apr 14 15:57:43 2020 Comment Inode space 1 Maximum number of inodes 10000384 Allocated inodes 10000384 Permission change flag chmodAndSetacl afm-associated Yes Target nfs://DK_VPN/mnt/fs01/AFMFILESET Mode single-writer File Lookup Refresh Interval 30 (default) File Open Refresh Interval 30 (default) Dir Lookup Refresh Interval 60 (default) Dir Open Refresh Interval 60 (default) Async Delay 15 (default) Last pSnapId 0 Display Home Snapshots no Number of Read Threads per Gateway 64 Parallel Read Chunk Size 128 Parallel Read Threshold 1024 Number of Gateway Flush Threads 48 Prefetch Threshold 0 (default) Eviction Enabled yes (default) Parallel Write Threshold 1024 Parallel Write Chunk Size 128 Number of Write Threads per Gateway 16 IO Flags 0 (default) mmfsadm dump afm: AFM Gateway: RpcQLen: 0 maxPoolSize: 4294967295 QOF: 0 MaxOF: 131072 readThLimit 128 minIOBuf 1048576 maxIOBuf 1073741824 msgMaxWriteSize 2147483648 readBypassThresh 67108864 QLen: 0 QMem: 0 SoftQMem: 10737418240 HardQMem 26843545600 Ping thread: Started Fileset: AFMFILESET 1 (fs02) mode: single-writer queue: Normal MDS: QMem 0 CTL 577 home: DK_VPN homeServer: 10.110.5.11 proto: nfs port: 2049 lastCmd: 16 handler: Mounted Dirty refCount: 1 queueTransfer: state: Idle senderVerified: 0 receiverVerified: 1 terminate: 0 psnapWait: 0 remoteAttrs: AsyncLookups 0 tsfindinode: success 0 failed 0 totalTime 0.0 avgTime 0,000000 maxTime 0.0 queue: delay 15 QLen 0+0 flushThds 0 maxFlushThds 48 numExec 8772518 qfs 0 iwo 0 err 78 handlerCreateTime : 2020-04-27_11:14:57.415+0200 numCreateSnaps : 0 InflightAsyncLookups 0 lastReplayTime : 2020-04-28_07:22:32.415+0200 lastSyncTime : 2020-04-27_15:09:57.415+0200 i/o: readBuf: 33554432 writeBuf: 2097152 sparseReadThresh: 134217728 pReadThreads 64 i/o: pReadChunkSize 33554432 pReadThresh: 1073741824 pWriteThresh: 1073741824 i/o: prefetchThresh 0 (Prefetch) Mnt status: 0:0 1:0 2:0 3:0 Export Map: 10.110.5.10/ 10.110.5.11/ 10.110.5.12/ 10.110.5.13/ Priority Queue: Empty (state: Active) Normal Queue: Empty (state: Active) Cluster Config Cache: maxFilesToCache 131072 maxStatCache 524288 afmDIO 2 afmIOFlags 4096 maxReceiverThreads 32 afmNumReadThreads 64 afmNumWriteThreads 8 afmHardMemThreshold 26843545600 maxBufferDescs 163840 afmMaxWriteMergeLen 32212254720 workerThreads 1024 The entries in the gpfs log states "AFM: Home is taking longer to respond..." but its only AFM and the Cache AFM fileset which enteres a locked state. we have the same NFS exports from home mounted on the same gateway nodes to check when a file is transferred and they are all ok while the AFM lock is happening. a simple gpfs restart of the AFM Master node is enough to make AFM restart and continue for another week.. The home target is exported through CES NFS from 4 CES nodes and a map is created at the Cache site to utilize the ParallelWrites feature. If there is anyone sitting around with some ideas/knowledge on how to tune this further for more stability then i would be happy if you could share your thoughts about it! :-) Many Thanks in Advance! Andi Christiansen _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=-XbtU1ILcqI_bUurDD3j1j-oqGszcNZAbQVIhQ5EZOs&s=IjrGy-VdY1cuNfy0bViEykWMEVDax7_xvrMdRhQ2QkM&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Tue Apr 28 12:38:01 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 28 Apr 2020 12:38:01 +0100 Subject: [gpfsug-discuss] wait for mount during gpfs startup In-Reply-To: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de> References: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de> Message-ID: <277171dd-9297-98e4-1e18-352fb49ef96f@strath.ac.uk> On 28/04/2020 11:57, Ulrich Sibiller wrote: > Hi, > > when the gpfs systemd service returns from startup the filesystems are > usually not mounted. So having another service depending on gpfs is not > feasible if you require the filesystem(s). > > Therefore we have added a script to the systemd gpfs service that waits > for all local gpfs filesystems being mounted. We have added that script > via ExecStartPost: > Yuck, and double yuck. There are many things you can say about systemd (and I have a choice few) but one of them is that it makes this sort of hackery obsolete. At least that is one of it goals. A systemd way to do it would be via one or more helper units. So lets assume your GPFS file system is mounted on /gpfs, then create a file called ismounted.txt on it and then create a unit called say gpfs_mounted.target that looks like # gpfs_mounted.target [Unit] TimeoutStartSec=infinity ConditionPathExists=/gpfs/ismounted.txt ExecStart=/usr/bin/sleep 10 RemainAfterExit=yes Then the main unit gets Wants=gpfs_mounted.target After=gpfs_mounted.target If you are using scripts in systemd you are almost certainly doing it wrong :-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From juergen.hannappel at desy.de Tue Apr 28 12:55:50 2020 From: juergen.hannappel at desy.de (Hannappel, Juergen) Date: Tue, 28 Apr 2020 13:55:50 +0200 (CEST) Subject: [gpfsug-discuss] wait for mount during gpfs startup In-Reply-To: <277171dd-9297-98e4-1e18-352fb49ef96f@strath.ac.uk> References: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de> <277171dd-9297-98e4-1e18-352fb49ef96f@strath.ac.uk> Message-ID: <792994589.3526899.1588074950106.JavaMail.zimbra@desy.de> Hi, a gpfs.mount target should be automatically created at boot by the systemd-fstab-generator from the fstab entry, so no need with hackery like ismountet.txt... ----- Original Message ----- > From: "Jonathan Buzzard" > To: gpfsug-discuss at spectrumscale.org > Sent: Tuesday, 28 April, 2020 13:38:01 > Subject: Re: [gpfsug-discuss] wait for mount during gpfs startup > Yuck, and double yuck. There are many things you can say about systemd > (and I have a choice few) but one of them is that it makes this sort of > hackery obsolete. At least that is one of it goals. > > A systemd way to do it would be via one or more helper units. So lets > assume your GPFS file system is mounted on /gpfs, then create a file > called ismounted.txt on it and then create a unit called say > gpfs_mounted.target that looks like > > > # gpfs_mounted.target > [Unit] > TimeoutStartSec=infinity > ConditionPathExists=/gpfs/ismounted.txt > ExecStart=/usr/bin/sleep 10 > RemainAfterExit=yes > > Then the main unit gets > > Wants=gpfs_mounted.target > After=gpfs_mounted.target > > If you are using scripts in systemd you are almost certainly doing it > wrong :-) > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From carlz at us.ibm.com Tue Apr 28 13:10:56 2020 From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com) Date: Tue, 28 Apr 2020 12:10:56 +0000 Subject: [gpfsug-discuss] wait for mount during gpfs startup (Ulrich Sibiller) Message-ID: There?s an RFE related to this: RFE 125955 (https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=125955) I recommend that people add their votes and comments there as well as discussing it here in the UG. Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com [signature_1027147421] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 69558 bytes Desc: image001.png URL: From andi at christiansen.xxx Tue Apr 28 13:25:37 2020 From: andi at christiansen.xxx (Andi Christiansen) Date: Tue, 28 Apr 2020 14:25:37 +0200 (CEST) Subject: [gpfsug-discuss] Tuning Spectrum Scale AFM for stability? In-Reply-To: References: <239358449.52194.1588055677577@privateemail.com> Message-ID: <467674858.57941.1588076737138@privateemail.com> Hi Venkat, The AFM fileset becomes totally unresponsive from all nodes within the cluster and the only way to resolve it is to do a "mmshutdown" and wait 2 mins, then "mmshutdown" again as it cannot really do it the first time.. and then a "mmstartup" then all is back to normal and AFM is stopped and can be started again for another week or so.. mmafmctl stop -j will just hang endless.. i will try to set that value and see if that does anything for us :) Thanks! Best Regards Andi Christiansen > On April 28, 2020 1:37 PM Venkateswara R Puvvada wrote: > > > Hi, > > What is lock down of AFM fileset ? Are the messages in requeued state and AFM won't replicate any data ? I would recommend opening a ticket by collecting the logs and internaldump from the gateway node when the replication is stuck. > > You can also try increasing the value of afmAsyncOpWaitTimeout option and see if this solves the issue. > > mmchconfig afmAsyncOpWaitTimeout=3600 -i > > ~Venkat (vpuvvada at in.ibm.com) > > > > From: Andi Christiansen > To: "gpfsug-discuss at spectrumscale.org" > Date: 04/28/2020 12:04 PM > Subject: [EXTERNAL] [gpfsug-discuss] Tuning Spectrum Scale AFM for stability? > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > --------------------------------------------- > > > > Hi All, > > Can anyone share some thoughts on how to tune AFM for stability? at the moment we have ok performance between our sites (5-8Gbits with 34ms latency) but we encounter a lock down of the cache fileset from week to week, which was day to day before we tuned below settings.. is there any way to tune AFM further i haven't found ? > > > Cache Site only: > TCP Settings: > sunrpc.tcp_slot_table_entries = 128 > > > Home and Cache: > AFM / GPFS Settings: > maxBufferDescs=163840 > afmHardMemThreshold=25G > afmMaxWriteMergeLen=30G > > > Cache fileset: > Attributes for fileset AFMFILESET: > ================================ > Status Linked > Path /mnt/fs02/AFMFILESET > Id 1 > Root inode 524291 > Parent Id 0 > Created Tue Apr 14 15:57:43 2020 > Comment > Inode space 1 > Maximum number of inodes 10000384 > Allocated inodes 10000384 > Permission change flag chmodAndSetacl > afm-associated Yes > Target nfs://DK_VPN/mnt/fs01/AFMFILESET > Mode single-writer > File Lookup Refresh Interval 30 (default) > File Open Refresh Interval 30 (default) > Dir Lookup Refresh Interval 60 (default) > Dir Open Refresh Interval 60 (default) > Async Delay 15 (default) > Last pSnapId 0 > Display Home Snapshots no > Number of Read Threads per Gateway 64 > Parallel Read Chunk Size 128 > Parallel Read Threshold 1024 > Number of Gateway Flush Threads 48 > Prefetch Threshold 0 (default) > Eviction Enabled yes (default) > Parallel Write Threshold 1024 > Parallel Write Chunk Size 128 > Number of Write Threads per Gateway 16 > IO Flags 0 (default) > > > mmfsadm dump afm: > AFM Gateway: > RpcQLen: 0 maxPoolSize: 4294967295 QOF: 0 MaxOF: 131072 > readThLimit 128 minIOBuf 1048576 maxIOBuf 1073741824 msgMaxWriteSize 2147483648 > readBypassThresh 67108864 > QLen: 0 QMem: 0 SoftQMem: 10737418240 HardQMem 26843545600 > Ping thread: Started > Fileset: AFMFILESET 1 (fs02) > mode: single-writer queue: Normal MDS: QMem 0 CTL 577 > home: DK_VPN homeServer: 10.110.5.11 proto: nfs port: 2049 lastCmd: 16 > handler: Mounted Dirty refCount: 1 > queueTransfer: state: Idle senderVerified: 0 receiverVerified: 1 terminate: 0 psnapWait: 0 > remoteAttrs: AsyncLookups 0 tsfindinode: success 0 failed 0 totalTime 0.0 avgTime 0,000000 maxTime 0.0 > queue: delay 15 QLen 0+0 flushThds 0 maxFlushThds 48 numExec 8772518 qfs 0 iwo 0 err 78 > handlerCreateTime : 2020-04-27_11:14:57.415+0200 numCreateSnaps : 0 InflightAsyncLookups 0 > lastReplayTime : 2020-04-28_07:22:32.415+0200 lastSyncTime : 2020-04-27_15:09:57.415+0200 > i/o: readBuf: 33554432 writeBuf: 2097152 sparseReadThresh: 134217728 pReadThreads 64 > i/o: pReadChunkSize 33554432 pReadThresh: 1073741824 pWriteThresh: 1073741824 > i/o: prefetchThresh 0 (Prefetch) > Mnt status: 0:0 1:0 2:0 3:0 > Export Map: 10.110.5.10/ 10.110.5.11/ 10.110.5.12/ 10.110.5.13/ > Priority Queue: Empty (state: Active) > Normal Queue: Empty (state: Active) > > > Cluster Config Cache: > maxFilesToCache 131072 > maxStatCache 524288 > afmDIO 2 > afmIOFlags 4096 > maxReceiverThreads 32 > afmNumReadThreads 64 > afmNumWriteThreads 8 > afmHardMemThreshold 26843545600 > maxBufferDescs 163840 > afmMaxWriteMergeLen 32212254720 > workerThreads 1024 > > > The entries in the gpfs log states "AFM: Home is taking longer to respond..." but its only AFM and the Cache AFM fileset which enteres a locked state. we have the same NFS exports from home mounted on the same gateway nodes to check when a file is transferred and they are all ok while the AFM lock is happening. a simple gpfs restart of the AFM Master node is enough to make AFM restart and continue for another week.. > > > The home target is exported through CES NFS from 4 CES nodes and a map is created at the Cache site to utilize the ParallelWrites feature. > > > If there is anyone sitting around with some ideas/knowledge on how to tune this further for more stability then i would be happy if you could share your thoughts about it! :-) > > > Many Thanks in Advance! > Andi Christiansen > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From skylar2 at uw.edu Tue Apr 28 14:57:36 2020 From: skylar2 at uw.edu (Skylar Thompson) Date: Tue, 28 Apr 2020 06:57:36 -0700 Subject: [gpfsug-discuss] wait for mount during gpfs startup In-Reply-To: References: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de> Message-ID: <20200428135736.3zqcvvupj2ipvjfw@illiuin> We use callbacks successfully to ensure Linux auditd rules are only loaded after GPFS is mounted. It was easy to setup, and there's very fine-grained events that you can trigger on: https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1adm_mmaddcallback.htm On Tue, Apr 28, 2020 at 11:30:38AM +0000, Frederick Stock wrote: > Have you looked a the mmaddcallback command and specifically the file system mount callbacks? -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From novosirj at rutgers.edu Tue Apr 28 17:33:34 2020 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Tue, 28 Apr 2020 16:33:34 +0000 Subject: [gpfsug-discuss] wait for mount during gpfs startup In-Reply-To: <792994589.3526899.1588074950106.JavaMail.zimbra@desy.de> References: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de> <277171dd-9297-98e4-1e18-352fb49ef96f@strath.ac.uk> <792994589.3526899.1588074950106.JavaMail.zimbra@desy.de> Message-ID: <2F49D93E-18CA-456D-9815-ACB581A646B7@rutgers.edu> Has anyone confirmed this? At one point, I mucked around with this somewhat endlessly to try to get something sane and systemd-based to work and ultimately surrendered and inserted a 30 second delay. I didn?t try the ?check for the presence of a file? thing as I?m allergic to that sort of thing (at least more allergic than I am to a time-based delay). I believe everything that I tried happens before the mount is complete. -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' > On Apr 28, 2020, at 7:55 AM, Hannappel, Juergen wrote: > > Hi, > a gpfs.mount target should be automatically created at boot by the > systemd-fstab-generator from the fstab entry, so no need with hackery like > ismountet.txt... > > > ----- Original Message ----- >> From: "Jonathan Buzzard" >> To: gpfsug-discuss at spectrumscale.org >> Sent: Tuesday, 28 April, 2020 13:38:01 >> Subject: Re: [gpfsug-discuss] wait for mount during gpfs startup > >> Yuck, and double yuck. There are many things you can say about systemd >> (and I have a choice few) but one of them is that it makes this sort of >> hackery obsolete. At least that is one of it goals. >> >> A systemd way to do it would be via one or more helper units. So lets >> assume your GPFS file system is mounted on /gpfs, then create a file >> called ismounted.txt on it and then create a unit called say >> gpfs_mounted.target that looks like >> >> >> # gpfs_mounted.target >> [Unit] >> TimeoutStartSec=infinity >> ConditionPathExists=/gpfs/ismounted.txt >> ExecStart=/usr/bin/sleep 10 >> RemainAfterExit=yes >> >> Then the main unit gets >> >> Wants=gpfs_mounted.target >> After=gpfs_mounted.target >> >> If you are using scripts in systemd you are almost certainly doing it >> wrong :-) >> >> JAB. >> >> -- >> Jonathan A. Buzzard Tel: +44141-5483420 >> HPC System Administrator, ARCHIE-WeSt. >> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From novosirj at rutgers.edu Tue Apr 28 18:32:25 2020 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Tue, 28 Apr 2020 17:32:25 +0000 Subject: [gpfsug-discuss] wait for mount during gpfs startup (Ulrich Sibiller) In-Reply-To: References: Message-ID: I?ve also voted and commented on the ticket, but I?ll say this here: If the amount of time I spent on this alone (and I like to think I?m pretty good with this sort of thing, and am somewhat of a systemd evangelist when the opportunity presents itself), this has caused a lot of people a lot of pain ? including time spent when their kludge to make this work causes some other problem, or having to reboot nodes in a much more manual way at times to ensure one of these nodes doesn?t dump work while it has no FS, etc. -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' > On Apr 28, 2020, at 8:10 AM, Carl Zetie - carlz at us.ibm.com wrote: > > There?s an RFE related to this: RFE 125955 (https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=125955) > > I recommend that people add their votes and comments there as well as discussing it here in the UG. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From chair at spectrumscale.org Wed Apr 29 22:29:34 2020 From: chair at spectrumscale.org (Simon Thompson (Spectrum Scale User Group Chair)) Date: Wed, 29 Apr 2020 22:29:34 +0100 Subject: [gpfsug-discuss] THINK Virtual User Group Day Message-ID: <5BE5B210-5FEE-45E0-AC0D-1B184B5B8E45@spectrumscale.org> Hi All, As part of IBM?s THINK digital event, there will be a virtual user group day. This isn?t an SSUG event, though we?ve been involved in some of the discussion about the topics for the event. Three of the four Storage sessions are focussed on Spectrum Scale. For storage this will be taking place on May 19th. Details of how to register for this event and the planned sessions are below (though I guess are still subject to change). Separately to this, the SSUG organisers are still in discussion about how we might present some sort of digital SSUG event, it won?t be a half/full day of talks, but likely a series of talks ? but we?re still working through the details with Ulf and the IBM team about how it might work. And if you are interested in THINK, this is free to register for this year as a digital only event https://www.ibm.com/events/think ? I promise this is my only reference to THINK ? Simon The registration site for the user group day is https://ibm-usergroups.bemyapp.com/ Storage Session 1 Title IBM Spectrum Scale: Use Cases and Field Lessons-learned with Kubernetes and OpenShift Abstract IBM Spectrum Scale user group leaders will discuss how to deploy IBM Spectrum Scale using Kubernetes and OpenShift, persistent volumes, IBM Storage Enabler for Containers, Kubernetes FlexVolume Drivers and IBM Spectrum Connect. We'll review real-world IBM Spectrum Scale use cases including advanced driver assistance systems (ADAS), cloud service providers (CSP), dev/test and multi-cloud. We'll also review most often-requested client topics including unsupported CSI platforms, security, multi-tenancy and how to deploy Spectrum Scale in heterogenous environments such as x86, IBM Power, and IBM Z by using IBM Cloud Private and OpenShift. Finally we'll describe IBM resources such as regional storage competency centers, training, testing labs and IBM Lab Services. Presenter Harald Seipp, Senior Technical Staff Member, Center of Excellence for Cloud Storage Storage Session 2 Title How to Efficiently Manage your Hadoop and Analytics Workflow with IBM Spectrum Scale Abstract This in-depth technical talk will compare traditional Hadoop vs. IBM Spectrum Scale through Hadoop Distributed File System (HDFS) on IBM Spectrum Scale, HDFS storage tiering & federation, HDFS backup, using IBM Spectrum Scale as an ingest tier, next generation workloads, disaster recovery and fault-tolerance using a single stretch cluster or multiple clusters using active file management (AFM), as well as HDFS integration within Cluster Export Services (CES). Presenter Andreas Koeninger, IBM Spectrum Scale Big Data and Analytics Storage Session 3 Title IBM Spectrum Scale: How to enable AI Workloads with OpenShift and IBM Spectrum Scale Abstract IBM Spectrum Scale user group leaders will deliver a in-depth technical presentation covering the enterprise AI data pipeline from ingest to insights, how to manage workloads at scale, how to integrate OpenShift 4.x and IBM Spectrum Scale 5.0.4.1, as well as preparing and installing the IBM Spectrum Scale CSI driver in OpenShift. We will also cover Kubernetes/OpenShift persistent volumes and use cases for provisioning with IBM Spectrum Scale CSI for AI workloads. Finally we will feature a demo of IBM Spectrum Scale CSI and TensorFlow in OpenShift 4.x. Presenters Gero Schmidt, IBM Spectrum Scale Development, Big Data Analytics Solutions Przemyslaw Podfigurny, IBM Spectrum Scale Development, AI/ML Big Data and Analytics Storage Session 4 Title Journey to Modern Data Protection for a Large Manufacturing Client Abstract In this webinar, we will discuss how industrial manufacturing organizations are addressing data protection. We will look at why holistic data protection is a critical infrastructure component and how modernization can provide a foundation for the future. We will share how customers are leveraging the IBM Spectrum Protect portfolio to address their IT organization's data protection, business continuity with software-defined data protection solutions. We will discuss various applications including data reuse, as well as providing instant access to data which can help an organization be more agile and reduce downtime. Presenters Adam Young, Russell Dwire -------------- next part -------------- An HTML attachment was scrubbed... URL: From u.sibiller at science-computing.de Thu Apr 30 11:50:27 2020 From: u.sibiller at science-computing.de (Ulrich Sibiller) Date: Thu, 30 Apr 2020 12:50:27 +0200 Subject: [gpfsug-discuss] wait for mount during gpfs startup In-Reply-To: <20200428135736.3zqcvvupj2ipvjfw@illiuin> References: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de> <20200428135736.3zqcvvupj2ipvjfw@illiuin> Message-ID: Am 28.04.20 um 15:57 schrieb Skylar Thompson: >> Have you looked a the mmaddcallback command and specifically the file system mount callbacks? > We use callbacks successfully to ensure Linux auditd rules are only loaded > after GPFS is mounted. It was easy to setup, and there's very fine-grained > events that you can trigger on: Thanks. But how do set this up for a systemd service? Disable the dependent service and start it from the callback? Create some kind of state file in the callback and let the dependent systemd service check that flag file in a busy loop? Use inotify for the flag file? Uli -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 From u.sibiller at science-computing.de Thu Apr 30 11:50:39 2020 From: u.sibiller at science-computing.de (Ulrich Sibiller) Date: Thu, 30 Apr 2020 12:50:39 +0200 Subject: [gpfsug-discuss] wait for mount during gpfs startup In-Reply-To: <792994589.3526899.1588074950106.JavaMail.zimbra@desy.de> References: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de> <277171dd-9297-98e4-1e18-352fb49ef96f@strath.ac.uk> <792994589.3526899.1588074950106.JavaMail.zimbra@desy.de> Message-ID: <4c9f3acc-cfc7-05a5-eca5-2054c67c0cc4@science-computing.de> Am 28.04.20 um 13:55 schrieb Hannappel, Juergen: > a gpfs.mount target should be automatically created at boot by the > systemd-fstab-generator from the fstab entry, so no need with hackery like > ismountet.txt... A generic gpfs.mount target does not seem to exist on my system. There are only specific mount targets for the mounted gpfs filesystems. So I'd need to individually configure each depend service on each system with the filesystem for wait for. My approach was more general in just waiting for all_local gpfs filesystems. So I can use the same configuration everywhere. Besides, I have once tested and found that these targets are not usable because of some oddities but unfortunately I don't remember details. But the outcome was my script from the initial post. Maybe it was that there's no automatic mount target for all_local, same problem as above. Uli -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 From u.sibiller at science-computing.de Thu Apr 30 12:14:07 2020 From: u.sibiller at science-computing.de (Ulrich Sibiller) Date: Thu, 30 Apr 2020 13:14:07 +0200 Subject: [gpfsug-discuss] wait for mount during gpfs startup In-Reply-To: <277171dd-9297-98e4-1e18-352fb49ef96f@strath.ac.uk> References: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de> <277171dd-9297-98e4-1e18-352fb49ef96f@strath.ac.uk> Message-ID: Am 28.04.20 um 13:38 schrieb Jonathan Buzzard: > Yuck, and double yuck. There are many things you can say about systemd > (and I have a choice few) but one of them is that it makes this sort of > hackery obsolete. At least that is one of it goals. > > A systemd way to do it would be via one or more helper units. So lets > assume your GPFS file system is mounted on /gpfs, then create a file > called ismounted.txt on it and then create a unit called say > gpfs_mounted.target that looks like > > > # gpfs_mounted.target > [Unit] > TimeoutStartSec=infinity > ConditionPathExists=/gpfs/ismounted.txt > ExecStart=/usr/bin/sleep 10 > RemainAfterExit=yes > > Then the main unit gets > > Wants=gpfs_mounted.target > After=gpfs_mounted.target > > If you are using scripts in systemd you are almost certainly doing it > wrong :-) Yes, that the right direction. But still not the way I'd like it to be. First, I don't really like the flag file stuff. Imagine the mess you'd create if multiple services would require flag files... Second, I am looking for an all_local target. That one cannot be solved using this approach, right? (same for all_remote or all) Uli -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 From scale at us.ibm.com Thu Apr 30 12:40:57 2020 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Thu, 30 Apr 2020 07:40:57 -0400 Subject: [gpfsug-discuss] wait for mount during gpfs startup In-Reply-To: References: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de><20200428135736.3zqcvvupj2ipvjfw@illiuin> Message-ID: I now better understand the functionality you were aiming to achieve. You want anything in systemd that is dependent on GPFS file systems being mounted to block until they are mounted. Currently we do not offer any such feature though as Carl Zetie noted there is an RFE for such functionality, RFE 125955 ( https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=125955 ). For the mmaddcallback what I was thinking could resolve your problem was for you to create a either a "startup" callback or "mount" callbacks for your file systems. I thought you could use those callbacks to track the file systems of interest and then use the appropriate means to integrate that information into the flow of systemd. I have never done this so perhaps it is not possible. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Ulrich Sibiller To: gpfsug-discuss at spectrumscale.org Date: 04/30/2020 06:57 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] wait for mount during gpfs startup Sent by: gpfsug-discuss-bounces at spectrumscale.org Am 28.04.20 um 15:57 schrieb Skylar Thompson: >> Have you looked a the mmaddcallback command and specifically the file system mount callbacks? > We use callbacks successfully to ensure Linux auditd rules are only loaded > after GPFS is mounted. It was easy to setup, and there's very fine-grained > events that you can trigger on: Thanks. But how do set this up for a systemd service? Disable the dependent service and start it from the callback? Create some kind of state file in the callback and let the dependent systemd service check that flag file in a busy loop? Use inotify for the flag file? Uli -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=KmkFZ30Ey3pB4QnhsP2vS2mmojVLAWGrIiStGaE0320&s=VHWoLbiq119iFhL724WAQwg4dSJ3KRNVSXnfrFBv9RQ&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From skylar2 at uw.edu Thu Apr 30 14:43:28 2020 From: skylar2 at uw.edu (Skylar Thompson) Date: Thu, 30 Apr 2020 06:43:28 -0700 Subject: [gpfsug-discuss] wait for mount during gpfs startup In-Reply-To: References: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de> <20200428135736.3zqcvvupj2ipvjfw@illiuin> Message-ID: <20200430134328.7qshqlrptw6hquls@illiuin> On Thu, Apr 30, 2020 at 12:50:27PM +0200, Ulrich Sibiller wrote: > Am 28.04.20 um 15:57 schrieb Skylar Thompson: > >> Have you looked a the mmaddcallback command and specifically the file system mount callbacks? > > > We use callbacks successfully to ensure Linux auditd rules are only loaded > > after GPFS is mounted. It was easy to setup, and there's very fine-grained > > events that you can trigger on: > > Thanks. But how do set this up for a systemd service? Disable the dependent service and start it > from the callback? Create some kind of state file in the callback and let the dependent systemd > service check that flag file in a busy loop? Use inotify for the flag file? In the pre-systemd days, I would say just disable the service and let the callback handle it. I do see your point, though, that you lose the other systemd ordering benefits if you start the service from the callback. Assuming you're still able to start the service via systemctl, I would probably just leave it disabled and let the callback handle it. In the case of auditd rules, it's not actually a service (just a command that needs to be run) so we didn't run into this specific problem. -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From andi at christiansen.xxx Wed Apr 1 10:04:56 2020 From: andi at christiansen.xxx (Andi Christiansen) Date: Wed, 1 Apr 2020 11:04:56 +0200 (CEST) Subject: [gpfsug-discuss] Enabling SSL/HTTPS/ on Object S3. Message-ID: <706418212.158040.1585731896422@privateemail.com> Hi, We are trying to enable S3 on the object protocol within scale but there seem to be little to no documentation to enable https endpoints for the S3 protocol? According to the documentation enabling S3 for the keystone server is possible with the mmuserauth command but when i try to run it as IBM have documented, it says that Object protocol is not correctly installed.. And yes it hasnt been configured yet.. The "mmobj swift base" command which is used to configure Object/S3 automatically includes the "mmuserauth" command without the ssl option enabled.. and then all endpoints will start with http:// I hope that anyone out there have a guide to do this ? or is able to explain how to set it up? Basically all i need is this: https://s3.something.com:8080 which points to the WAN ip of the CES cluster (already configured and ready) and endpoints like this: None | keystone | identity | True | public | https://cluster_domain:5000/ RegionOne | swift | object-store | True | public | https://cluster_domain:443/v1/AUTH_%(tenant_id)s RegionOne | swift | object-store | True | public | https://cluster_domain:8080/v1/AUTH_%(tenant_id)s if i manually add those endpoints and put my certificates in /etc/swift/ and update the config it says (SSL: Wrong_Version_Number). Here is output: C:\Users\Andi Christiansen>aws --endpoint-url https://WAN_IP/DOMAIN https://WAN :443 s3 ls SSL validation failed for https://WAN_IP/DOMAIN:443/ [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate (_ssl.c:1076) C:\Users\Andi Christiansen>aws --endpoint-url https://WAN_IP/DOMAIN:8080 s3 ls SSL validation failed for https://WAN_IP/DOMAIN:8080/ [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1076) its only port 8080 and 5000 that is allowed through the firewall, so i only tested with 443 to see if it gave another error as it is not allowed through and it did.. It works just fine when "mmobj swift base" is run normally and i only have http endpoints, then it is reachable from local network or WAN with no issues.. Thanks in advance! Best Regards Andi Christiansen -------------- next part -------------- An HTML attachment was scrubbed... URL: From smita.raut at in.ibm.com Wed Apr 1 10:52:44 2020 From: smita.raut at in.ibm.com (Smita J Raut) Date: Wed, 1 Apr 2020 15:22:44 +0530 Subject: [gpfsug-discuss] Enabling SSL/HTTPS/ on Object S3. In-Reply-To: <706418212.158040.1585731896422@privateemail.com> References: <706418212.158040.1585731896422@privateemail.com> Message-ID: Hi Andi, For object SSL configuration you need to reconfigure auth after "mmobj swift base". Instructions are here- https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1adm_configlocalauthssl.htm Some more info on object auth configuration- https://www.slideshare.net/SmitaRaut/ibm-spectrum-scale-authentication-for-object-deep-dive (Check slide 26) Thanks, Smita From: Andi Christiansen To: "gpfsug-discuss at spectrumscale.org" Date: 04/01/2020 02:35 PM Subject: [EXTERNAL] [gpfsug-discuss] Enabling SSL/HTTPS/ on Object S3. Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, We are trying to enable S3 on the object protocol within scale but there seem to be little to no documentation to enable https endpoints for the S3 protocol? According to the documentation enabling S3 for the keystone server is possible with the mmuserauth command but when i try to run it as IBM have documented, it says that Object protocol is not correctly installed.. And yes it hasnt been configured yet.. The "mmobj swift base" command which is used to configure Object/S3 automatically includes the "mmuserauth" command without the ssl option enabled.. and then all endpoints will start with http:// I hope that anyone out there have a guide to do this ? or is able to explain how to set it up? Basically all i need is this: https://s3.something.com:8080 which points to the WAN ip of the CES cluster (already configured and ready) and endpoints like this: None | keystone | identity | True | public | https://cluster_domain:5000/ RegionOne | swift | object-store | True | public | https://cluster_domain:443/v1/AUTH_%(tenant_id)s RegionOne | swift | object-store | True | public | https://cluster_domain:8080/v1/AUTH_%(tenant_id)s if i manually add those endpoints and put my certificates in /etc/swift/ and update the config it says (SSL: Wrong_Version_Number). Here is output: C:\Users\Andi Christiansen>aws --endpoint-url https://WAN_IP/DOMAIN:443 s3 ls SSL validation failed for https://WAN_IP/DOMAIN:443/ [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate (_ssl.c:1076) C:\Users\Andi Christiansen>aws --endpoint-url https://WAN_IP/DOMAIN:8080 s3 ls SSL validation failed for https://WAN_IP/DOMAIN:8080/ [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1076) its only port 8080 and 5000 that is allowed through the firewall, so i only tested with 443 to see if it gave another error as it is not allowed through and it did.. It works just fine when "mmobj swift base" is run normally and i only have http endpoints, then it is reachable from local network or WAN with no issues.. Thanks in advance! Best Regards Andi Christiansen _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=ZKPP3G6NR3aLNRqaXZWW90vDcvevU1hcxJA6_1Up8Ic&m=ZSHZbcegNHURIVsXPDASH5sTFwYAZYYLv-RnoaKNzxw&s=n1X6h1EYg8gdiHH8BFe4OYVQvIMSxoYXRMX3SC2IaBY&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From andi at christiansen.xxx Wed Apr 1 12:21:37 2020 From: andi at christiansen.xxx (Andi Christiansen) Date: Wed, 1 Apr 2020 13:21:37 +0200 (CEST) Subject: [gpfsug-discuss] Enabling SSL/HTTPS/ on Object S3. In-Reply-To: References: <706418212.158040.1585731896422@privateemail.com> Message-ID: <1057409925.160136.1585740097841@privateemail.com> Hi Smita, Thanks for your reply. i have tried what you suggested. mmobj swift base ran fine. but after i have deleted the userauth and try to set it up again with ks-ssl enabled it just hangs: # mmuserauth service create --data-access-method object --type local --enable-ks-ssl still waiting for it to finish, 15 mins now.. :) Best Regards Andi Christiansen > On April 1, 2020 11:52 AM Smita J Raut wrote: > > > Hi Andi, > > For object SSL configuration you need to reconfigure auth after "mmobj swift base". Instructions are here- > https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1adm_configlocalauthssl.htm https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1adm_configlocalauthssl.htm > > Some more info on object auth configuration- > https://www.slideshare.net/SmitaRaut/ibm-spectrum-scale-authentication-for-object-deep-dive https://www.slideshare.net/SmitaRaut/ibm-spectrum-scale-authentication-for-object-deep-dive (Check slide 26) > > Thanks, > Smita > > > > From: Andi Christiansen > To: "gpfsug-discuss at spectrumscale.org" > Date: 04/01/2020 02:35 PM > Subject: [EXTERNAL] [gpfsug-discuss] Enabling SSL/HTTPS/ on Object S3. > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > --------------------------------------------- > > > > Hi, > > We are trying to enable S3 on the object protocol within scale but there seem to be little to no documentation to enable https endpoints for the S3 protocol? > > According to the documentation enabling S3 for the keystone server is possible with the mmuserauth command but when i try to run it as IBM have documented, it says that Object protocol is not correctly installed.. And yes it hasnt been configured yet.. > > The "mmobj swift base" command which is used to configure Object/S3 automatically includes the "mmuserauth" command without the ssl option enabled.. and then all endpoints will start with http:// > > > I hope that anyone out there have a guide to do this ? or is able to explain how to set it up? > > > Basically all i need is this: > > https://s3.something.com:8080 https://s3.something.com:8080 which points to the WAN ip of the CES cluster (already configured and ready) > > and endpoints like this: > > None | keystone | identity | True | public | https://cluster_domain:5000/ https://cluster_domain:5000/ > RegionOne | swift | object-store | True | public | https://cluster_domain:443/v1/AUTH_%(tenant_id)s > RegionOne | swift | object-store | True | public | https://cluster_domain:8080/v1/AUTH_%(tenant_id)s > > if i manually add those endpoints and put my certificates in /etc/swift/ and update the config it says (SSL: Wrong_Version_Number). Here is output: > > C:\Users\Andi Christiansen>aws --endpoint-url https://WAN_IP/DOMAIN https://WAN :443 s3 ls > SSL validation failed for https://WAN_IP/DOMAIN:443/ https://WAN_IP/DOMAIN:443/ [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate (_ssl.c:1076) > C:\Users\Andi Christiansen>aws --endpoint-url https://WAN_IP/DOMAIN:8080 https://WAN_IP/DOMAIN:8080 s3 ls > SSL validation failed for https://WAN_IP/DOMAIN:8080/ https://WAN_IP/DOMAIN:8080/ [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1076) > > > its only port 8080 and 5000 that is allowed through the firewall, so i only tested with 443 to see if it gave another error as it is not allowed through and it did.. > > > It works just fine when "mmobj swift base" is run normally and i only have http endpoints, then it is reachable from local network or WAN with no issues.. > > > > Thanks in advance! > > > Best Regards > Andi Christiansen _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Wed Apr 1 15:06:43 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 1 Apr 2020 15:06:43 +0100 Subject: [gpfsug-discuss] DSS-G dowloads Message-ID: <58b5e5f2-8da5-9b14-0dfb-acfe9cef0713@strath.ac.uk> I have just been trying to download the 2.4b release and I am not getting anywhere. A little investigation shows that lenovoesd.flexnetoperations.com does not resolve in the DNS. Not from work, not from home, not using 1.1.1.1 or 8.8.8.8 Anyone know what is going on? Has it moved and if so to where? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From S.J.Thompson at bham.ac.uk Wed Apr 1 15:40:30 2020 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Wed, 1 Apr 2020 14:40:30 +0000 Subject: [gpfsug-discuss] DSS-G dowloads In-Reply-To: <58b5e5f2-8da5-9b14-0dfb-acfe9cef0713@strath.ac.uk> References: <58b5e5f2-8da5-9b14-0dfb-acfe9cef0713@strath.ac.uk> Message-ID: <763B4054-02F2-483A-9465-3102AB5493E1@bham.ac.uk> It moved. We had email notifications about this ages ago. Accounts were created automatically for us for those on the contract admin role. 2.5c is latest release (5.0.4-1.6 or 4.2.3-18) Go to https://commercial.lenovo.com/ Simon Simon ?On 01/04/2020, 15:06, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" wrote: I have just been trying to download the 2.4b release and I am not getting anywhere. A little investigation shows that lenovoesd.flexnetoperations.com does not resolve in the DNS. Not from work, not from home, not using 1.1.1.1 or 8.8.8.8 Anyone know what is going on? Has it moved and if so to where? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jroche at lenovo.com Wed Apr 1 15:34:36 2020 From: jroche at lenovo.com (Jim Roche) Date: Wed, 1 Apr 2020 14:34:36 +0000 Subject: [gpfsug-discuss] [External] DSS-G dowloads In-Reply-To: <58b5e5f2-8da5-9b14-0dfb-acfe9cef0713@strath.ac.uk> References: <58b5e5f2-8da5-9b14-0dfb-acfe9cef0713@strath.ac.uk> Message-ID: Hi Jonathan, I don't think the site has moved. I'm investigating why it cannot be found and will let you know what is going on. Regards, Jim Jim Roche Head of Research Computing University Relations Manager Redwood, 3 Chineham Business Park, Crockford Lane Basingstoke Hampshire RG24 8WQ Lenovo UK +44 7702678579 jroche at lenovo.com ? Lenovo.com? Twitter?|?Instagram?|?Facebook?|?Linkedin?|?YouTube?|?Privacy? -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jonathan Buzzard Sent: 01 April 2020 15:07 To: gpfsug-discuss at spectrumscale.org Subject: [External] [gpfsug-discuss] DSS-G dowloads I have just been trying to download the 2.4b release and I am not getting anywhere. A little investigation shows that lenovoesd.flexnetoperations.com does not resolve in the DNS. Not from work, not from home, not using 1.1.1.1 or 8.8.8.8 Anyone know what is going on? Has it moved and if so to where? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From ncalimet at lenovo.com Wed Apr 1 15:46:32 2020 From: ncalimet at lenovo.com (Nicolas CALIMET) Date: Wed, 1 Apr 2020 14:46:32 +0000 Subject: [gpfsug-discuss] [External] DSS-G dowloads In-Reply-To: <58b5e5f2-8da5-9b14-0dfb-acfe9cef0713@strath.ac.uk> References: <58b5e5f2-8da5-9b14-0dfb-acfe9cef0713@strath.ac.uk> Message-ID: <477be93f0bc8411a8d8c31935db28a4f@lenovo.com> The old Lenovo ESD website is gone; retired some time ago. Please visit instead: https://commercial.lenovo.com FWIW the most current release is DSS-G 2.5c. Thanks -- Nicolas Calimet, PhD | HPC System Architect | Lenovo DCG | Meitnerstrasse 9, D-70563 Stuttgart, Germany | +49 71165690146 -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jonathan Buzzard Sent: Wednesday, April 1, 2020 16:07 To: gpfsug-discuss at spectrumscale.org Subject: [External] [gpfsug-discuss] DSS-G dowloads I have just been trying to download the 2.4b release and I am not getting anywhere. A little investigation shows that lenovoesd.flexnetoperations.com does not resolve in the DNS. Not from work, not from home, not using 1.1.1.1 or 8.8.8.8 Anyone know what is going on? Has it moved and if so to where? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathan.buzzard at strath.ac.uk Wed Apr 1 19:50:28 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 1 Apr 2020 19:50:28 +0100 Subject: [gpfsug-discuss] DSS-G dowloads In-Reply-To: <763B4054-02F2-483A-9465-3102AB5493E1@bham.ac.uk> References: <58b5e5f2-8da5-9b14-0dfb-acfe9cef0713@strath.ac.uk> <763B4054-02F2-483A-9465-3102AB5493E1@bham.ac.uk> Message-ID: On 01/04/2020 15:40, Simon Thompson wrote: > It moved. We had email notifications about this ages ago. Accounts > were created automatically for us for those on the contract admin > role. 2.5c is latest release (5.0.4-1.6 or 4.2.3-18) > You are right once I search my spam folder. Thanks a bunch Microsoft. I am still not convinced that are still not evil. They seem determined to put my CentOS security emails in the spam folder. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From jkavitsky at 23andme.com Fri Apr 3 23:25:33 2020 From: jkavitsky at 23andme.com (Jim Kavitsky) Date: Fri, 3 Apr 2020 15:25:33 -0700 Subject: [gpfsug-discuss] fast search for archivable data sets Message-ID: Hello everyone, I'm managing a low-multi-petabyte Scale filesystem with hundreds of millions of inodes, and I'm looking for the best way to locate archivable directories. For example, these might be directories where whose contents were greater than 5 or 10TB, and whose contents had atimes greater than two years. Has anyone found a great way to do this with a policy engine run? If not, is there another good way that anyone would recommend? Thanks in advance, Jim Kavitsky -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Sat Apr 4 00:45:18 2020 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Fri, 3 Apr 2020 19:45:18 -0400 Subject: [gpfsug-discuss] fast search for archivable data sets In-Reply-To: References: Message-ID: Hi Jim, If you never worked with policy rules before, you may want to start by building your nerves to it. In the /usr/lpp/mmfs/samples/ilm path you will find several examples of templates that you can use to play around. I would start with the 'list' rules first. Some of those templates are a bit complex, so here is one script that I use on a regular basis to detect files larger than 1MB (you can even exclude specific filesets): ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ dss-mgt1:/scratch/r/root/mmpolicyRules # cat mmpolicyRules-list-large /* A macro to abbreviate VARCHAR */ define([vc],[VARCHAR($1)]) /* Define three external lists */ RULE EXTERNAL LIST 'largefiles' EXEC '/gpfs/fs0/scratch/r/root/mmpolicyRules/mmpolicyExec-list' /* Generate a list of all files that have more than 1MB of space allocated. */ RULE 'r2' LIST 'largefiles' SHOW('-u' vc(USER_ID) || ' -s' || vc(FILE_SIZE)) /*FROM POOL 'system'*/ FROM POOL 'data' /*FOR FILESET('root')*/ WEIGHT(FILE_SIZE) WHERE KB_ALLOCATED > 1024 /* Files in special filesets, such as mmpolicyRules, are never moved or deleted */ RULE 'ExcSpecialFile' EXCLUDE FOR FILESET('mmpolicyRules','todelete','tapenode-stuff','toarchive') ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ And here is another to detect files not looked at for more than 6 months. I found more effective to use atime and ctime. You could combine this with the one above to detect file size as well. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ dss-mgt1:/scratch/r/root/mmpolicyRules # cat mmpolicyRules-list-atime-ctime-gt-6months /* A macro to abbreviate VARCHAR */ define([vc],[VARCHAR($1)]) /* Define three external lists */ RULE EXTERNAL LIST 'accessedfiles' EXEC '/gpfs/fs0/scratch/r/root/mmpolicyRules/mmpolicyExec-list' /* Generate a list of all files, directories, plus all other file system objects, like symlinks, named pipes, etc, accessed prior to a certain date AND that are not owned by root. Include the owner's id with each object and sort them by the owner's id */ /* Files in special filesets, such as mmpolicyRules, are never moved or deleted */ RULE 'ExcSpecialFile' EXCLUDE FOR FILESET ('scratch-root','todelete','root') RULE 'r5' LIST 'accessedfiles' DIRECTORIES_PLUS FROM POOL 'data' SHOW('-u' vc(USER_ID) || ' -a' || vc(ACCESS_TIME) || ' -c' || vc(CREATION_TIME) || ' -s ' || vc(FILE_SIZE)) WHERE (DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME) > 183) AND (DAYS(CURRENT_TIMESTAMP) - DAYS(CREATION_TIME) > 183) AND NOT USER_ID = 0 AND NOT (PATH_NAME LIKE '/gpfs/fs0/scratch/r/root/%') ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Note that both these scripts work on a system wide (or root fileset) basis, and will not give you specific directories, unless you run them several times on specific directories (not very efficient). To produce general lists per directory you would need to do some post processing on the lists, with 'awk' or some other scripting language. If you need some samples I can send you. And finally, you need to be more specific by what you mean by 'archivable'. Once you produce the list you can do several things with them or leverage the rules to actually execute things, such as move, delete, or hsm stuff. The /usr/lpp/mmfs/samples/ilm path has some samples as well. On 4/3/2020 18:25:33, Jim Kavitsky wrote: > Hello everyone, > I'm managing a low-multi-petabyte Scale filesystem with hundreds of millions of inodes, and I'm looking?for the best way to locate archivable directories. For example, these might be directories where whose contents were greater than 5 or 10TB, and whose contents had atimes greater than two years. > > Has anyone found a great way to do this with a policy engine run? If not, is there another good way that anyone would recommend? Thanks in advance, yes, there is another way, the 'mmfind' utility, also in the same sample path. You have to compile it for you OS (mmfind.README). This is a very powerful canned procedure that lets you run the "-exec" option just as in the normal linux version of 'find'. I use it very often, and it's just as efficient as the other policy rules based alternative. Good luck. Keep safe and confined. Jaime > > Jim Kavitsky > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > . . . ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 From alex at calicolabs.com Sat Apr 4 00:50:50 2020 From: alex at calicolabs.com (Alex Chekholko) Date: Fri, 3 Apr 2020 16:50:50 -0700 Subject: [gpfsug-discuss] fast search for archivable data sets In-Reply-To: References: Message-ID: Hi Jim, The common non-GPFS-specific way is to use a tool that dumps all of your filesystem metadata into an SQL database and then you can have a webapp that makes nice graphs/reports from the SQL database, or do your own queries. The Free Software example is "Robinhood" (use the POSIX scanner, not the lustre-specific one) and one proprietary example is Starfish. In both cases, you need a pretty beefy machine for the DB and the scanning of your filesystem may take a long time, depending on your filesystem performance. And then without any filesystem-specific hooks like a transaction log, you'll need to rescan the entire filesystem to update your db. Regards, Alex On Fri, Apr 3, 2020 at 3:25 PM Jim Kavitsky wrote: > Hello everyone, > I'm managing a low-multi-petabyte Scale filesystem with hundreds of > millions of inodes, and I'm looking for the best way to locate archivable > directories. For example, these might be directories where whose contents > were greater than 5 or 10TB, and whose contents had atimes greater than two > years. > > Has anyone found a great way to do this with a policy engine run? If not, > is there another good way that anyone would recommend? Thanks in advance, > > Jim Kavitsky > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cblack at nygenome.org Sat Apr 4 01:26:22 2020 From: cblack at nygenome.org (Christopher Black) Date: Sat, 4 Apr 2020 00:26:22 +0000 Subject: [gpfsug-discuss] fast search for archivable data sets In-Reply-To: References: Message-ID: As Alex mentioned, there are tools that will keep filesystem metadata in a database and provide query tools. NYGC uses Starfish and we?ve had good experience with it. At first the only feature we used is ?sfdu? which is a quick replacement for recursive du. Using this we can script csv reports for selections of dirs. As we use starfish more, we?ve started opening the web interface to people to look at selected areas of our filesystems where they can sort directories by size, mtime, atime, and run other reports and queries. We?ve also started using tagging functionality so we can quickly get an aggregate total (and growth over time) by tag across multiple directories. We tried Robinhood years ago but found it was taking too much work to get it to scale to 100s of millions of files and 10s of PiB on gpfs. It might be better now. IBM has a metadata product called Spectrum Discover that has the benefit of using gpfs-specific interfaces to be always up to date. Many of the other tools require scheduling scans to update the db. Igneous has a commercial tool called DataDiscover which also looked promising. ClarityNow and MediaFlux are other similar tools. I expect all of these tools at the very least have nice replacements for du and find as well as some sort of web directory tree view. We had run Starfish for a while and did a re-evaluation of a few options in 2019 and ultimately decided to stay with Starfish for now. Best, Chris From: on behalf of Alex Chekholko Reply-To: gpfsug main discussion list Date: Friday, April 3, 2020 at 7:51 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] fast search for archivable data sets Hi Jim, The common non-GPFS-specific way is to use a tool that dumps all of your filesystem metadata into an SQL database and then you can have a webapp that makes nice graphs/reports from the SQL database, or do your own queries. The Free Software example is "Robinhood" (use the POSIX scanner, not the lustre-specific one) and one proprietary example is Starfish. In both cases, you need a pretty beefy machine for the DB and the scanning of your filesystem may take a long time, depending on your filesystem performance. And then without any filesystem-specific hooks like a transaction log, you'll need to rescan the entire filesystem to update your db. Regards, Alex On Fri, Apr 3, 2020 at 3:25 PM Jim Kavitsky > wrote: Hello everyone, I'm managing a low-multi-petabyte Scale filesystem with hundreds of millions of inodes, and I'm looking for the best way to locate archivable directories. For example, these might be directories where whose contents were greater than 5 or 10TB, and whose contents had atimes greater than two years. Has anyone found a great way to do this with a policy engine run? If not, is there another good way that anyone would recommend? Thanks in advance, Jim Kavitsky _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From leslie.james.elliott at gmail.com Sat Apr 4 07:00:34 2020 From: leslie.james.elliott at gmail.com (leslie elliott) Date: Sat, 4 Apr 2020 16:00:34 +1000 Subject: [gpfsug-discuss] afmHashVersion Message-ID: I was wondering if there was any more information on the different values for afmHashVersion the default value is 2 but if we want to assign an afmGateway to a fileset we need a value of 5 is there likely to be any performance degradation because of this change do the home cluster and the cache cluster both have to be set to 5 for the fileset allocation to gateways just trying to find a little more information before we try this on a production system with a large number of afm independent filesets leslie -------------- next part -------------- An HTML attachment was scrubbed... URL: From spectrumscale at kiranghag.com Sat Apr 4 22:57:33 2020 From: spectrumscale at kiranghag.com (KG) Date: Sun, 5 Apr 2020 03:27:33 +0530 Subject: [gpfsug-discuss] io500 - mmfind - Pfind found 0 matches, something is wrong with the script. Message-ID: Hi Folks I am trying to setup IO500 test on a scale cluster and looking for more info on mmfind. I have compiled mmfindUtil_processOutputFile and updated the correct path in mmfind.sh. The runs however do not come up with any matches. Any pointers wrt something that I may have missed? TIA [Starting] mdtest_hard_write [Exec] mpirun -np 2 /tools/io-500-dev-master/bin/mdtest -C -t -F -P -w *3901 *-e 3901 -d /ibm/nasdata/datafiles/io500.2020.04.05-03.20.00/mdt_hard -n 950000 -x /ibm/nasdata/datafiles/io500.2020.04.05-03.20.00/mdt_hard-stonewall -a POSIX -N 1 -Y -W 5 [Results] in /ibm/nasdata/results/2020.04.05-03.20.00/mdtest_hard_write.txt. [Warning] This cannot be an official IO-500 score. The phase runtime of 9.8918s is below 300s. [Warning] Suggest io500_mdtest_hard_files_per_proc=30732525 [RESULT-invalid] IOPS phase 2 mdtest_hard_write 0.225 kiops : time 8.99 seconds [Starting] find [Exec] mpirun -np 2 /tools/io-500-dev-master/bin/mmfind.sh /ibm/nasdata/datafiles/io500.2020.04.05-03.20.00 -newer /ibm/nasdata/datafiles/io500.2020.04.05-03.20.00/timestampfile -size *3901c *-name "*01*" [Results] in /ibm/nasdata/results/2020.04.05-03.20.00/find.txt. *[Warning] Pfind found 0 matches, something is wrong with the script.* [FIND] *MATCHED 0/3192* in 12.0671 seconds -------------- next part -------------- An HTML attachment was scrubbed... URL: From vpuvvada at in.ibm.com Mon Apr 6 10:16:49 2020 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Mon, 6 Apr 2020 14:46:49 +0530 Subject: [gpfsug-discuss] afmHashVersion In-Reply-To: References: Message-ID: afmHashVersion=5 does not cause any performance degradation, this hash version allows assigning a gateway for the fileset using mmchfileset command. This option is not required for AFM home cluster(assuming that home is not a cache for other home). It is needed only at the AFM cache cluster and at client cluster if it remote mounts the AFM cache cluster. For changing afmHashVersion=5, all the nodes in the AFM cache and client cluster have to be upgraded to the minimum 5.0.2 level. This option cannot be set dynamically using -i/-I option, all the nodes in the both AFM cache and client clusters have to be shutdown to set this option. It is recommended to use 5.0.4-3 or later. https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.2/com.ibm.spectrum.scale.v5r02.doc/bl1adm_mmchconfig.htm ~Venkat (vpuvvada at in.ibm.com) From: leslie elliott To: gpfsug main discussion list Date: 04/04/2020 11:30 AM Subject: [EXTERNAL] [gpfsug-discuss] afmHashVersion Sent by: gpfsug-discuss-bounces at spectrumscale.org I was wondering if there was any more information on the different values for afmHashVersion the default value is 2 but if we want to assign an afmGateway to a fileset we need a value of 5 is there likely to be any performance degradation because of this change do the home cluster and the cache cluster both have to be set to 5 for the fileset allocation to gateways just trying to find a little more information before we try this on a production system with a large number of afm independent filesets leslie _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=Y5qSHFJ-z_7fbgD3YvcDG0SCsJbJ5rvNPBI5y5eF6Ec&s=b7XaEKNTas9WQ9qZNBSOW2XDvQNzUMTgdcAb7lQ4170&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From marc.caubet at psi.ch Mon Apr 6 12:20:59 2020 From: marc.caubet at psi.ch (Caubet Serrabou Marc (PSI)) Date: Mon, 6 Apr 2020 11:20:59 +0000 Subject: [gpfsug-discuss] "csm_resync_needed" after upgrading to GPFS v5.0.4-2 Message-ID: Hi all, after upgrading one of the clusters to GPFS v5.0.4-2 and setting "minReleaseLevel 5.0.4.0" I started to see random "csm_resync_needed" errors on some nodes. This can be easily cleared with "mmhealth node show --resync", however after some minutes the error re-appears. Apparently, no errors in the log files and no apparent problems other than the "csm_resync_needed" error. Before opening a support case, any hints about what could be the reason of that and whether I should worry about it? I would like to clarify what's going on before upgrading the main cluster. Thanks a lot, Marc _________________________________________________________ Paul Scherrer Institut High Performance Computing & Emerging Technologies Marc Caubet Serrabou Building/Room: OHSA/014 Forschungsstrasse, 111 5232 Villigen PSI Switzerland Telephone: +41 56 310 46 67 E-Mail: marc.caubet at psi.ch -------------- next part -------------- An HTML attachment was scrubbed... URL: From NSCHULD at de.ibm.com Mon Apr 6 13:25:22 2020 From: NSCHULD at de.ibm.com (Norbert Schuld) Date: Mon, 6 Apr 2020 14:25:22 +0200 Subject: [gpfsug-discuss] =?utf-8?q?=22csm=5Fresync=5Fneeded=22_after_upgr?= =?utf-8?q?ading_to_GPFS=09v5=2E0=2E4-2?= In-Reply-To: References: Message-ID: Hi, are the nodes running on AIX? If so my advice would be to change /var/mmfs/mmsysmon/mmsysmonitor.conf to read [InterNodeEventing] usesharedlib = 0 and the do a "mmsysmoncontrol restart". What was the min. release level before the upgrade? For most other cases a "mmsysmoncontrol restart" on the affected nodes + cluster manager node should do. Mit freundlichen Gr??en / Kind regards Norbert Schuld From: "Caubet Serrabou Marc (PSI)" To: "gpfsug-discuss at spectrumscale.org" Date: 06.04.2020 13:36 Subject: [EXTERNAL] [gpfsug-discuss] "csm_resync_needed" after upgrading to GPFS v5.0.4-2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, after upgrading one of the clusters to GPFS v5.0.4-2 and setting "minReleaseLevel 5.0.4.0" I started to see random "csm_resync_needed" errors on some nodes. This can be easily cleared with "mmhealth node show --resync", however after some minutes the error re-appears. Apparently, no errors in the log files and no apparent problems other than the "csm_resync_needed" error. Before opening a support case, any hints about what could be the reason of that and whether I should worry about it? I would like to clarify what's going on before upgrading the main cluster. Thanks a lot, Marc _________________________________________________________ Paul Scherrer Institut High Performance Computing & Emerging Technologies Marc Caubet Serrabou Building/Room: OHSA/014 Forschungsstrasse, 111 5232 Villigen PSI Switzerland Telephone: +41 56 310 46 67 E-Mail: marc.caubet at psi.ch_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=i4V0h7L9ElftZNfcuPIXmAHN2jl5TLcuyFLqtinu4j8&m=gU-FoFUzF10SfzgJPcd51vPIxjhkE6puV5hxAyPIA6I&s=zdEGNkM_ZSiem6wnOFZFVpTGjvSPG4wlFUFIhDVqcWM&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From marc.caubet at psi.ch Mon Apr 6 13:54:43 2020 From: marc.caubet at psi.ch (Caubet Serrabou Marc (PSI)) Date: Mon, 6 Apr 2020 12:54:43 +0000 Subject: [gpfsug-discuss] "csm_resync_needed" after upgrading to GPFS v5.0.4-2 In-Reply-To: References: , Message-ID: <46571b6503544f329029b2520c70152e@psi.ch> Hi Norbert, thanks a lot for for answering. The nodes are running RHEL7.7 (Kernel 3.10.0-1062.12.1.el7.x86_64). The previous version was 5.0.3-2. I restarted mmsysmoncontrol (I kept usesharedlib=1 as this is RHEL). Restarting it, it cleans mmhealth messages as expected, let's see whether this is repeated or not but it might take several minutes. Just add that when I had a mix of 5.0.3-2 and 5.0.4-2 I received some 'stale_mount' messages (from GPFSGUI) for a remote cluster filesystem mountpoints, but apparently everything worked fine. After upgrading everything to v5.0.4-2 looks like the same nodes report the 'csm_resync_needed' instead (no more 'stale_mount' errors seen since then). I am not sure whether this is related or not but might be a hint if this is related. Best regards, Marc _________________________________________________________ Paul Scherrer Institut High Performance Computing & Emerging Technologies Marc Caubet Serrabou Building/Room: OHSA/014 Forschungsstrasse, 111 5232 Villigen PSI Switzerland Telephone: +41 56 310 46 67 E-Mail: marc.caubet at psi.ch ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Norbert Schuld Sent: Monday, April 6, 2020 2:25:22 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] "csm_resync_needed" after upgrading to GPFS v5.0.4-2 Hi, are the nodes running on AIX? If so my advice would be to change /var/mmfs/mmsysmon/mmsysmonitor.conf to read [InterNodeEventing] usesharedlib = 0 and the do a "mmsysmoncontrol restart". What was the min. release level before the upgrade? For most other cases a "mmsysmoncontrol restart" on the affected nodes + cluster manager node should do. Mit freundlichen Gr??en / Kind regards Norbert Schuld [Inactive hide details for "Caubet Serrabou Marc (PSI)" ---06.04.2020 13:36:28---Hi all, after upgrading one of the clusters to]"Caubet Serrabou Marc (PSI)" ---06.04.2020 13:36:28---Hi all, after upgrading one of the clusters to GPFS v5.0.4-2 and setting "minReleaseLevel 5.0.4.0" I From: "Caubet Serrabou Marc (PSI)" To: "gpfsug-discuss at spectrumscale.org" Date: 06.04.2020 13:36 Subject: [EXTERNAL] [gpfsug-discuss] "csm_resync_needed" after upgrading to GPFS v5.0.4-2 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi all, after upgrading one of the clusters to GPFS v5.0.4-2 and setting "minReleaseLevel 5.0.4.0" I started to see random "csm_resync_needed" errors on some nodes. This can be easily cleared with "mmhealth node show --resync", however after some minutes the error re-appears. Apparently, no errors in the log files and no apparent problems other than the "csm_resync_needed" error. Before opening a support case, any hints about what could be the reason of that and whether I should worry about it? I would like to clarify what's going on before upgrading the main cluster. Thanks a lot, Marc _________________________________________________________ Paul Scherrer Institut High Performance Computing & Emerging Technologies Marc Caubet Serrabou Building/Room: OHSA/014 Forschungsstrasse, 111 5232 Villigen PSI Switzerland Telephone: +41 56 310 46 67 E-Mail: marc.caubet at psi.ch_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: graycol.gif URL: From lists at esquad.de Mon Apr 6 13:50:28 2020 From: lists at esquad.de (Dieter Mosbach) Date: Mon, 6 Apr 2020 14:50:28 +0200 Subject: [gpfsug-discuss] "csm_resync_needed" after upgrading to GPFS v5.0.4-2 In-Reply-To: References: Message-ID: <04d1ba1d-e3d0-41ab-85c7-c6d6cabfd0d4@esquad.de> Am 06.04.2020 um 13:20 schrieb Caubet Serrabou Marc (PSI): > Hi all, > > > after upgrading one of the clusters to GPFS v5.0.4-2 and setting "minReleaseLevel 5.0.4.0" I started to see random "csm_resync_needed" errors on some nodes. This can be easily cleared with "mmhealth node show --resync", however after some minutes the error re-appears. > > Apparently, no errors in the log files and no apparent problems other than the "csm_resync_needed" error. > > > Before opening a support case, any hints about what could be the reason of that and whether I should worry about it? I would like to clarify what's going on before upgrading the main cluster. This seems to be a bug in v5, open a support case. We had to check: mmdsh -N [AIXNode1,AIXNode2,AIXNode3] "grep usesharedlib /var/mmfs/mmsysmon/mmsysmonitor.conf" and to change: mmdsh -N [AIXNode1,AIXNode2,AIXNode3] "sed -i 's/usesharedlib = 1/usesharedlib = 0/g' /var/mmfs/mmsysmon/mmsysmonitor.conf" mmdsh -N [AIXNode1,AIXNode2,AIXNode3] "mmsysmoncontrol restart" Regards, Dieter From marc.caubet at psi.ch Tue Apr 7 07:38:42 2020 From: marc.caubet at psi.ch (Caubet Serrabou Marc (PSI)) Date: Tue, 7 Apr 2020 06:38:42 +0000 Subject: [gpfsug-discuss] "csm_resync_needed" after upgrading to GPFS v5.0.4-2 In-Reply-To: <04d1ba1d-e3d0-41ab-85c7-c6d6cabfd0d4@esquad.de> References: , <04d1ba1d-e3d0-41ab-85c7-c6d6cabfd0d4@esquad.de> Message-ID: <66cfa1b3942d45489c611d72e5b39d42@psi.ch> Hi, just for the record, after restarting mmsysmoncontrol on all nodes looks like the errors disappeared and no longer appear (and it has been running for several hours already). No need to change usesharedlib, which I have it enabled (1) for RHEL systems. Thanks a lot for your help, Marc _________________________________________________________ Paul Scherrer Institut High Performance Computing & Emerging Technologies Marc Caubet Serrabou Building/Room: OHSA/014 Forschungsstrasse, 111 5232 Villigen PSI Switzerland Telephone: +41 56 310 46 67 E-Mail: marc.caubet at psi.ch ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Dieter Mosbach Sent: Monday, April 6, 2020 2:50:28 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] "csm_resync_needed" after upgrading to GPFS v5.0.4-2 Am 06.04.2020 um 13:20 schrieb Caubet Serrabou Marc (PSI): > Hi all, > > > after upgrading one of the clusters to GPFS v5.0.4-2 and setting "minReleaseLevel 5.0.4.0" I started to see random "csm_resync_needed" errors on some nodes. This can be easily cleared with "mmhealth node show --resync", however after some minutes the error re-appears. > > Apparently, no errors in the log files and no apparent problems other than the "csm_resync_needed" error. > > > Before opening a support case, any hints about what could be the reason of that and whether I should worry about it? I would like to clarify what's going on before upgrading the main cluster. This seems to be a bug in v5, open a support case. We had to check: mmdsh -N [AIXNode1,AIXNode2,AIXNode3] "grep usesharedlib /var/mmfs/mmsysmon/mmsysmonitor.conf" and to change: mmdsh -N [AIXNode1,AIXNode2,AIXNode3] "sed -i 's/usesharedlib = 1/usesharedlib = 0/g' /var/mmfs/mmsysmon/mmsysmonitor.conf" mmdsh -N [AIXNode1,AIXNode2,AIXNode3] "mmsysmoncontrol restart" Regards, Dieter _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From kraemerf at de.ibm.com Tue Apr 14 08:42:12 2020 From: kraemerf at de.ibm.com (Frank Kraemer) Date: Tue, 14 Apr 2020 09:42:12 +0200 Subject: [gpfsug-discuss] *NEWS* IBM Spectrum Discover for Storage Metadata Management Video April 09, 2020 Message-ID: *FYI* IBM Spectrum Discover is a next-generation metadata management solution that delivers exceptional performance at exabyte scale, so organizations can harness value from massive amounts of unstructured data from heterogeneous file and object storage on premises and in the cloud to create competitive advantage in the areas of analytics and AI initiatives, governance, and storage optimization. Here are other videos in this series of related IBM Spectrum Discover topics that give you examples to get started: 1) IBM Spectrum Discover: Download, Deploy, and Configure https://youtu.be/FMOuzn__qRI 2) IBM Spectrum Discover: Scanning S3 data sources such as Amazon S3 or Ceph https://youtu.be/zaADfeTGwzY 3) IBM Spectrum Discover: Scanning IBM Spectrum Scale (GPFS) and IBM ESS data sources https://youtu.be/3mBQciR2tXE 4) IBM Spectrum Discover: Scanning an IBM Spectrum Protect data source https://youtu.be/wdXvnJ_GEQs 5) IBM Spectrum Discover: Insights into your files for better TCO with IBM Spectrum Archive EE https://youtu.be/_YNfFDdMEa4 Appendix: Here are additional online educational materials related to IBM Spectrum Discover solutions: IBM Spectrum Discover Knowledge Center: https://www.ibm.com/support/knowledgecenter/SSY8AC IBM Spectrum Discover Free 90 Day Trial: https://www.ibm.com/us-en/marketplace/spectrum-discover IBM Spectrum Discover: Metadata Management for Deep Insight of Unstructured Storage, REDP-5550: http://www.redbooks.ibm.com/abstracts/redp5550.html -frank- Frank Kraemer IBM Consulting IT Specialist / Client Technical Architect Am Weiher 24, 65451 Kelsterbach, Germany mailto:kraemerf at de.ibm.com Mobile +49171-3043699 IBM Germany -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Apr 14 11:15:41 2020 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 14 Apr 2020 10:15:41 +0000 Subject: [gpfsug-discuss] *NEWS* IBM Spectrum Discover for Storage Metadata Management Video April 09, 2020 In-Reply-To: References: Message-ID: <714908f022894851b52efa0944c80737@bham.ac.uk> Just a reminder that this is a Spectrum Scale technical forum and shouldn't be used for marketing nor advertising of other products. There are a number of vendors who have competing products who might also wish to post here. If you wish to discuss Discover at a technical level, there is a dedicated channel on the slack community for this. Thanks Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of kraemerf at de.ibm.com Sent: 14 April 2020 08:42 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] *NEWS* IBM Spectrum Discover for Storage Metadata Management Video April 09, 2020 *FYI* -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Wed Apr 15 16:29:53 2020 From: knop at us.ibm.com (Felipe Knop) Date: Wed, 15 Apr 2020 15:29:53 +0000 Subject: [gpfsug-discuss] Kernel crashes with Spectrum Scale and RHEL 7.7 3.10.0-1062.18.1.el7 kernel Message-ID: An HTML attachment was scrubbed... URL: From xhejtman at ics.muni.cz Wed Apr 15 16:36:48 2020 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Wed, 15 Apr 2020 17:36:48 +0200 Subject: [gpfsug-discuss] Kernel crashes with Spectrum Scale and RHEL 7.7 3.10.0-1062.18.1.el7 kernel In-Reply-To: References: Message-ID: <20200415153648.GK30439@ics.muni.cz> Hello, I noticed this bug, it took about 10 minutes to crash. However, I'm seeing similar NULL pointer dereference even with older kernels, That dereference does not happen always in GPFS code, sometimes outside in NFS or elsewhere, however it looks familiar. I have many crashdumps about this. On Wed, Apr 15, 2020 at 03:29:53PM +0000, Felipe Knop wrote: > All, > ? > A problem has been identified with Spectrum Scale when running on RHEL 7.7 > and kernel 3.10.0-1062.18.1.el7.? While a fix is being currently > developed, customers should not move up to this kernel level. > ? > The new kernel was issued on March 17 via the following errata:? > [1]https://access.redhat.com/errata/RHSA-2020:0834 > ? > When this kernel is used with Scale, system crashes have been observed. > The following are a couple of examples of kernel stack traces for the > crash: > ? > ? > [ 2915.625015] BUG: unable to handle kernel NULL pointer dereference at > 0000000000000040 > [ 2915.633770] IP: [] > cxiDropSambaDCacheEntry+0x190/0x1b0 [mmfslinux] > [ 2915.914097]? [] gpfs_i_rmdir+0x29c/0x310 [mmfslinux] > [ 2915.921381]? [] ? take_dentry_name_snapshot+0xf0/0xf0 > [ 2915.928760]? [] ? shrink_dcache_parent+0x60/0x90 > [ 2915.935656]? [] vfs_rmdir+0xdc/0x150 > [ 2915.941388]? [] do_rmdir+0x1f1/0x220 > [ 2915.947119]? [] ? __fput+0x186/0x260 > [ 2915.952849]? [] ? ____fput+0xe/0x10 > [ 2915.958484]? [] ? task_work_run+0xc0/0xe0 > [ 2915.964701]? [] SyS_unlinkat+0x25/0x40 > ? > [1224278.495993] [] __dentry_kill+0x128/0x190 > [1224278.496678] [] dput+0xb6/0x1a0 > [1224278.497378] [] d_prune_aliases+0xb6/0xf0 > [1224278.498083] [] cxiPruneDCacheEntry+0x13a/0x1c0 > [mmfslinux] > [1224278.498798] [] > _ZN10gpfsNode_t16invalidateOSNodeEPS_Pvij+0x108/0x350 [mmfs26] > ? > ? > RHEL 7.8 is also impacted by the same problem, but validation of Scale > with 7.8 is still under way. > ? > ? > ? Felipe > ? > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > ? > > References > > Visible links > 1. https://access.redhat.com/errata/RHSA-2020:0834 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title From laurence.schuler at nasa.gov Wed Apr 15 16:49:59 2020 From: laurence.schuler at nasa.gov (Schuler, Laurence (GSFC-606.4)[ADNET SYSTEMS INC]) Date: Wed, 15 Apr 2020 15:49:59 +0000 Subject: [gpfsug-discuss] [EXTERNAL] Kernel crashes with Spectrum Scale and RHEL 7.7 3.10.0-1062.18.1.el7 kernel In-Reply-To: References: Message-ID: <8A1DF819-E3B3-4727-8CA6-F26C5BFC09B4@nasa.gov> Will this impact *any* version of Spectrum Scale? -Laurence From: on behalf of Felipe Knop Reply-To: gpfsug main discussion list Date: Wednesday, April 15, 2020 at 11:30 AM To: "gpfsug-discuss at spectrumscale.org" Subject: [EXTERNAL] [gpfsug-discuss] Kernel crashes with Spectrum Scale and RHEL 7.7 3.10.0-1062.18.1.el7 kernel All, A problem has been identified with Spectrum Scale when running on RHEL 7.7 and kernel 3.10.0-1062.18.1.el7. While a fix is being currently developed, customers should not move up to this kernel level. The new kernel was issued on March 17 via the following errata: https://access.redhat.com/errata/RHSA-2020:0834 When this kernel is used with Scale, system crashes have been observed. The following are a couple of examples of kernel stack traces for the crash: [ 2915.625015] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040 [ 2915.633770] IP: [] cxiDropSambaDCacheEntry+0x190/0x1b0 [mmfslinux] [ 2915.914097] [] gpfs_i_rmdir+0x29c/0x310 [mmfslinux] [ 2915.921381] [] ? take_dentry_name_snapshot+0xf0/0xf0 [ 2915.928760] [] ? shrink_dcache_parent+0x60/0x90 [ 2915.935656] [] vfs_rmdir+0xdc/0x150 [ 2915.941388] [] do_rmdir+0x1f1/0x220 [ 2915.947119] [] ? __fput+0x186/0x260 [ 2915.952849] [] ? ____fput+0xe/0x10 [ 2915.958484] [] ? task_work_run+0xc0/0xe0 [ 2915.964701] [] SyS_unlinkat+0x25/0x40 [1224278.495993] [] __dentry_kill+0x128/0x190 [1224278.496678] [] dput+0xb6/0x1a0 [1224278.497378] [] d_prune_aliases+0xb6/0xf0 [1224278.498083] [] cxiPruneDCacheEntry+0x13a/0x1c0 [mmfslinux] [1224278.498798] [] _ZN10gpfsNode_t16invalidateOSNodeEPS_Pvij+0x108/0x350 [mmfs26] RHEL 7.8 is also impacted by the same problem, but validation of Scale with 7.8 is still under way. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 9466 bytes Desc: not available URL: From knop at us.ibm.com Wed Apr 15 17:25:41 2020 From: knop at us.ibm.com (Felipe Knop) Date: Wed, 15 Apr 2020 16:25:41 +0000 Subject: [gpfsug-discuss] Kernel crashes with Spectrum Scale and RHEL 7.7 3.10.0-1062.18.1.el7 kernel In-Reply-To: <8A1DF819-E3B3-4727-8CA6-F26C5BFC09B4@nasa.gov> References: <8A1DF819-E3B3-4727-8CA6-F26C5BFC09B4@nasa.gov>, Message-ID: An HTML attachment was scrubbed... URL: From xhejtman at ics.muni.cz Wed Apr 15 17:35:12 2020 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Wed, 15 Apr 2020 18:35:12 +0200 Subject: [gpfsug-discuss] Kernel crashes with Spectrum Scale and RHEL 7.7 3.10.0-1062.18.1.el7 kernel In-Reply-To: References: <8A1DF819-E3B3-4727-8CA6-F26C5BFC09B4@nasa.gov> Message-ID: <20200415163512.GP30439@ics.muni.cz> And are you sure it is present only in -1062.18.1.el7 kernel? I think it is present in all -1062.* kernels.. On Wed, Apr 15, 2020 at 04:25:41PM +0000, Felipe Knop wrote: > Laurence, > ? > The problem affects all the Scale releases / PTFs. > ? > ? Felipe > ? > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > ? > ? > ? > > ----- Original message ----- > From: "Schuler, Laurence (GSFC-606.4)[ADNET SYSTEMS INC]" > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: Re: [gpfsug-discuss] [EXTERNAL] Kernel crashes with Spectrum > Scale and RHEL 7.7 3.10.0-1062.18.1.el7 kernel > Date: Wed, Apr 15, 2020 12:10 PM > ? > > Will this impact *any* version of Spectrum Scale? > > ? > > -Laurence > > ? > > From: on behalf of Felipe > Knop > Reply-To: gpfsug main discussion list > Date: Wednesday, April 15, 2020 at 11:30 AM > To: "gpfsug-discuss at spectrumscale.org" > > Subject: [EXTERNAL] [gpfsug-discuss] Kernel crashes with Spectrum Scale > and RHEL 7.7 3.10.0-1062.18.1.el7 kernel > > ? > > All, > > ? > > A problem has been identified with Spectrum Scale when running on RHEL > 7.7 and kernel 3.10.0-1062.18.1.el7.? While a fix is being currently > developed, customers should not move up to this kernel level. > > ? > > The new kernel was issued on March 17 via the following errata:? > [1]https://access.redhat.com/errata/RHSA-2020:0834 > > ? > > When this kernel is used with Scale, system crashes have been observed. > The following are a couple of examples of kernel stack traces for the > crash: > > ? > > ? > > [ 2915.625015] BUG: unable to handle kernel NULL pointer dereference at > 0000000000000040 > [ 2915.633770] IP: [] > cxiDropSambaDCacheEntry+0x190/0x1b0 [mmfslinux] > > [ 2915.914097]? [] gpfs_i_rmdir+0x29c/0x310 > [mmfslinux] > [ 2915.921381]? [] ? > take_dentry_name_snapshot+0xf0/0xf0 > [ 2915.928760]? [] ? shrink_dcache_parent+0x60/0x90 > [ 2915.935656]? [] vfs_rmdir+0xdc/0x150 > [ 2915.941388]? [] do_rmdir+0x1f1/0x220 > [ 2915.947119]? [] ? __fput+0x186/0x260 > [ 2915.952849]? [] ? ____fput+0xe/0x10 > [ 2915.958484]? [] ? task_work_run+0xc0/0xe0 > [ 2915.964701]? [] SyS_unlinkat+0x25/0x40 > > ? > > [1224278.495993] [] __dentry_kill+0x128/0x190 > [1224278.496678] [] dput+0xb6/0x1a0 > [1224278.497378] [] d_prune_aliases+0xb6/0xf0 > [1224278.498083] [] cxiPruneDCacheEntry+0x13a/0x1c0 > [mmfslinux] > [1224278.498798] [] > _ZN10gpfsNode_t16invalidateOSNodeEPS_Pvij+0x108/0x350 [mmfs26] > > ? > > ? > > RHEL 7.8 is also impacted by the same problem, but validation of Scale > with 7.8 is still under way. > > ? > > ? > > ? Felipe > > ? > > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > ? > > ? > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > [2]http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > ? > > References > > Visible links > 1. https://access.redhat.com/errata/RHSA-2020:0834 > 2. http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title From knop at us.ibm.com Wed Apr 15 17:51:02 2020 From: knop at us.ibm.com (Felipe Knop) Date: Wed, 15 Apr 2020 16:51:02 +0000 Subject: [gpfsug-discuss] Kernel crashes with Spectrum Scale and RHEL 7.7 3.10.0-1062.18.1.el7 kernel In-Reply-To: <20200415163512.GP30439@ics.muni.cz> References: <20200415163512.GP30439@ics.muni.cz>, <8A1DF819-E3B3-4727-8CA6-F26C5BFC09B4@nasa.gov> Message-ID: An HTML attachment was scrubbed... URL: From xhejtman at ics.muni.cz Wed Apr 15 18:06:57 2020 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Wed, 15 Apr 2020 19:06:57 +0200 Subject: [gpfsug-discuss] Kernel crashes with Spectrum Scale and RHEL 7.7 3.10.0-1062.18.1.el7 kernel In-Reply-To: References: <20200415163512.GP30439@ics.muni.cz> <8A1DF819-E3B3-4727-8CA6-F26C5BFC09B4@nasa.gov> Message-ID: <20200415170657.GQ30439@ics.muni.cz> Should I report then or just wait to fix 18.1 problem and see whether older ones are gone as well? On Wed, Apr 15, 2020 at 04:51:02PM +0000, Felipe Knop wrote: > Lukas, > ? > There was one particular kernel change introduced in 3.10.0-1062.18.1 that > has triggered a given set of crashes. It's possible, though, that there is > a lingering problem affecting older levels of 3.10.0-1062. I believe that > crashes occurring on older kernels should be treated as separate problems. > ? > ? Felipe > ? > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > ? > ? > ? > > ----- Original message ----- > From: Lukas Hejtmanek > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: [EXTERNAL] Re: [gpfsug-discuss] Kernel crashes with Spectrum > Scale and RHEL 7.7 3.10.0-1062.18.1.el7 kernel > Date: Wed, Apr 15, 2020 12:35 PM > ? > And are you sure it is present only in -1062.18.1.el7 kernel? I think it > is > present in all -1062.* kernels.. > > On Wed, Apr 15, 2020 at 04:25:41PM +0000, Felipe Knop wrote: > > ? ?Laurence, > > ? ?? > > ? ?The problem affects all the Scale releases / PTFs. > > ? ?? > > ? ?? Felipe > > ? ?? > > ? ?---- > > ? ?Felipe Knop knop at us.ibm.com > > ? ?GPFS Development and Security > > ? ?IBM Systems > > ? ?IBM Building 008 > > ? ?2455 South Rd, Poughkeepsie, NY 12601 > > ? ?(845) 433-9314 T/L 293-9314 > > ? ?? > > ? ?? > > ? ?? > > > > ? ? ?----- Original message ----- > > ? ? ?From: "Schuler, Laurence (GSFC-606.4)[ADNET SYSTEMS INC]" > > ? ? ? > > ? ? ?Sent by: gpfsug-discuss-bounces at spectrumscale.org > > ? ? ?To: gpfsug main discussion list > > > ? ? ?Cc: > > ? ? ?Subject: Re: [gpfsug-discuss] [EXTERNAL] Kernel crashes with > Spectrum > > ? ? ?Scale and RHEL 7.7 3.10.0-1062.18.1.el7 kernel > > ? ? ?Date: Wed, Apr 15, 2020 12:10 PM > > ? ? ?? > > > > ? ? ?Will this impact *any* version of Spectrum Scale? > > > > ? ? ?? > > > > ? ? ?-Laurence > > > > ? ? ?? > > > > ? ? ?From: on behalf of > Felipe > > ? ? ?Knop > > ? ? ?Reply-To: gpfsug main discussion list > > > ? ? ?Date: Wednesday, April 15, 2020 at 11:30 AM > > ? ? ?To: "gpfsug-discuss at spectrumscale.org" > > ? ? ? > > ? ? ?Subject: [EXTERNAL] [gpfsug-discuss] Kernel crashes with Spectrum > Scale > > ? ? ?and RHEL 7.7 3.10.0-1062.18.1.el7 kernel > > > > ? ? ?? > > > > ? ? ?All, > > > > ? ? ?? > > > > ? ? ?A problem has been identified with Spectrum Scale when running on > RHEL > > ? ? ?7.7 and kernel 3.10.0-1062.18.1.el7.? While a fix is being > currently > > ? ? ?developed, customers should not move up to this kernel level. > > > > ? ? ?? > > > > ? ? ?The new kernel was issued on March 17 via the following errata:? > > ? ? ?[1][1]https://access.redhat.com/errata/RHSA-2020:0834? > > > > ? ? ?? > > > > ? ? ?When this kernel is used with Scale, system crashes have been > observed. > > ? ? ?The following are a couple of examples of kernel stack traces for > the > > ? ? ?crash: > > > > ? ? ?? > > > > ? ? ?? > > > > ? ? ?[ 2915.625015] BUG: unable to handle kernel NULL pointer > dereference at > > ? ? ?0000000000000040 > > ? ? ?[ 2915.633770] IP: [] > > ? ? ?cxiDropSambaDCacheEntry+0x190/0x1b0 [mmfslinux] > > > > ? ? ?[ 2915.914097]? [] gpfs_i_rmdir+0x29c/0x310 > > ? ? ?[mmfslinux] > > ? ? ?[ 2915.921381]? [] ? > > ? ? ?take_dentry_name_snapshot+0xf0/0xf0 > > ? ? ?[ 2915.928760]? [] ? > shrink_dcache_parent+0x60/0x90 > > ? ? ?[ 2915.935656]? [] vfs_rmdir+0xdc/0x150 > > ? ? ?[ 2915.941388]? [] do_rmdir+0x1f1/0x220 > > ? ? ?[ 2915.947119]? [] ? __fput+0x186/0x260 > > ? ? ?[ 2915.952849]? [] ? ____fput+0xe/0x10 > > ? ? ?[ 2915.958484]? [] ? task_work_run+0xc0/0xe0 > > ? ? ?[ 2915.964701]? [] SyS_unlinkat+0x25/0x40 > > > > ? ? ?? > > > > ? ? ?[1224278.495993] [] __dentry_kill+0x128/0x190 > > ? ? ?[1224278.496678] [] dput+0xb6/0x1a0 > > ? ? ?[1224278.497378] [] d_prune_aliases+0xb6/0xf0 > > ? ? ?[1224278.498083] [] > cxiPruneDCacheEntry+0x13a/0x1c0 > > ? ? ?[mmfslinux] > > ? ? ?[1224278.498798] [] > > ? ? ?_ZN10gpfsNode_t16invalidateOSNodeEPS_Pvij+0x108/0x350 [mmfs26] > > > > ? ? ?? > > > > ? ? ?? > > > > ? ? ?RHEL 7.8 is also impacted by the same problem, but validation of > Scale > > ? ? ?with 7.8 is still under way. > > > > ? ? ?? > > > > ? ? ?? > > > > ? ? ?? Felipe > > > > ? ? ?? > > > > ? ? ?---- > > ? ? ?Felipe Knop knop at us.ibm.com > > ? ? ?GPFS Development and Security > > ? ? ?IBM Systems > > ? ? ?IBM Building 008 > > ? ? ?2455 South Rd, Poughkeepsie, NY 12601 > > ? ? ?(845) 433-9314 T/L 293-9314 > > ? ? ?? > > > > ? ? ?? > > ? ? ?_______________________________________________ > > ? ? ?gpfsug-discuss mailing list > > ? ? ?gpfsug-discuss at spectrumscale.org > > ? ? ?[2][2]http://gpfsug.org/mailman/listinfo/gpfsug-discuss? > > > > ? ?? > > > > References > > > > ? ?Visible links > > ? ?1. [3]https://access.redhat.com/errata/RHSA-2020:0834? > > ? ?2. [4]http://gpfsug.org/mailman/listinfo/gpfsug-discuss? > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > [5]http://gpfsug.org/mailman/listinfo/gpfsug-discuss? > > -- > Luk?? Hejtm?nek > > Linux Administrator only because > ??Full Time Multitasking Ninja > ??is not an official job title > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > [6]http://gpfsug.org/mailman/listinfo/gpfsug-discuss? > ? > > ? > > References > > Visible links > 1. https://access.redhat.com/errata/RHSA-2020:0834 > 2. http://gpfsug.org/mailman/listinfo/gpfsug-discuss > 3. https://access.redhat.com/errata/RHSA-2020:0834 > 4. http://gpfsug.org/mailman/listinfo/gpfsug-discuss > 5. http://gpfsug.org/mailman/listinfo/gpfsug-discuss > 6. http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title From knop at us.ibm.com Wed Apr 15 19:17:15 2020 From: knop at us.ibm.com (Felipe Knop) Date: Wed, 15 Apr 2020 18:17:15 +0000 Subject: [gpfsug-discuss] Kernel crashes with Spectrum Scale and RHEL 7.7 3.10.0-1062.18.1.el7 kernel In-Reply-To: <20200415170657.GQ30439@ics.muni.cz> References: <20200415170657.GQ30439@ics.muni.cz>, <20200415163512.GP30439@ics.muni.cz><8A1DF819-E3B3-4727-8CA6-F26C5BFC09B4@nasa.gov> Message-ID: An HTML attachment was scrubbed... URL: From dean.flanders at fmi.ch Thu Apr 16 04:26:36 2020 From: dean.flanders at fmi.ch (Flanders, Dean) Date: Thu, 16 Apr 2020 03:26:36 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing Message-ID: Hello All, As IBM has completely switched to capacity based licensing in order to use SS v5 I was wondering how others are dealing with this? We do not find the capacity based licensing sustainable. Our long term plan is to migrate away from SS v5 to Lustre, and based on the Lustre roadmap we have seen it should have the features we need within the next ~1 year (we are fortunate to have good contacts). We would really like to stay with SS/GPFS and have been big advocates of SS/GPFS over the years, but the capacity based licensing is pushing us into evaluating alternatives. I realize this may not be proper to discuss this directly in this email list, so feel free to email directly with your suggestions or your plans. Thanks and kind regards, Dean -------------- next part -------------- An HTML attachment was scrubbed... URL: From heinrich.billich at id.ethz.ch Thu Apr 16 09:16:59 2020 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Thu, 16 Apr 2020 08:16:59 +0000 Subject: [gpfsug-discuss] Mmhealth events longwaiters_found and deadlock_detected Message-ID: <04D32874-9ABD-4591-8C2B-19D596789ED5@id.ethz.ch> Hello, I?m puzzled about the difference between the two mmhealth events longwaiters_found ERROR Detected Spectrum Scale long-waiters and deadlock_detected WARNING The cluster detected a Spectrum Scale filesystem deadlock Especially why the later has level WARNING only while the first has level ERROR? Longwaiters_found is based on the output of ?mmdiag ?deadlock? and occurs much more often on our clusters, while the later probably is triggered by an external event and no internal mmsysmon check? Deadlock detection is handled by mmfsd? Whenever a deadlock is detected some debug data is collected, which is not true for longwaiters_detected. Hm, so why is no deadlock detected whenever mmdiag ?deadlock shows waiting threads? Shouldn?t the severity be the opposite way? Finally: Can we trigger some debug data collection whenever a longwaiters_found event happens ? just getting the output of ?mmdiag ?deadlock? on the single node could give some hints. Without I don?t see any real chance to take any action. Thank you, Heiner -- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From Anna.Greim at de.ibm.com Thu Apr 16 11:55:56 2020 From: Anna.Greim at de.ibm.com (Anna Greim) Date: Thu, 16 Apr 2020 12:55:56 +0200 Subject: [gpfsug-discuss] =?utf-8?q?Mmhealth_events_longwaiters=5Ffound_an?= =?utf-8?q?d=09deadlock=5Fdetected?= In-Reply-To: <04D32874-9ABD-4591-8C2B-19D596789ED5@id.ethz.ch> References: <04D32874-9ABD-4591-8C2B-19D596789ED5@id.ethz.ch> Message-ID: Hi Heiner, I'm not really able to give you insights into the decision of the events' states. Maybe somebody else is able to answer here. But about your triggering debug data collection question, please have a look at this documentation page: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1adv_createscriptforevents.htm This feature is in the product since the 5.0.x versions and should be helpful here. It will trigger your eventsCallback script when the event is raised. One of the script's arguments is the event name. So it is possible to create a script, that checks for the event name longwaiters_found and then triggers a mmdiag --deadlock and write it into a txt file. The script call has a hard time out of 60 seconds so it does not interfere too much with the mmsysmon internals, but better would be a run time less than 1 second. Mit freundlichen Gr??en / Kind regards Anna Greim Software Engineer, Spectrum Scale Development IBM Systems IBM Data Privacy Statement IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Gregor Pillen Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "Billich Heinrich Rainer (ID SD)" To: gpfsug main discussion list Date: 16/04/2020 10:36 Subject: [EXTERNAL] [gpfsug-discuss] Mmhealth events longwaiters_found and deadlock_detected Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello, I?m puzzled about the difference between the two mmhealth events longwaiters_found ERROR Detected Spectrum Scale long-waiters and deadlock_detected WARNING The cluster detected a Spectrum Scale filesystem deadlock Especially why the later has level WARNING only while the first has level ERROR? Longwaiters_found is based on the output of ?mmdiag ?deadlock? and occurs much more often on our clusters, while the later probably is triggered by an external event and no internal mmsysmon check? Deadlock detection is handled by mmfsd? Whenever a deadlock is detected some debug data is collected, which is not true for longwaiters_detected. Hm, so why is no deadlock detected whenever mmdiag ?deadlock shows waiting threads? Shouldn?t the severity be the opposite way? Finally: Can we trigger some debug data collection whenever a longwaiters_found event happens ? just getting the output of ?mmdiag ?deadlock? on the single node could give some hints. Without I don?t see any real chance to take any action. Thank you, Heiner -- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=XLDdnBDnIn497KhM7_npStR6ig1r198VHeSBY1WbuHc&m=QAa_5ZRNpy310ikXZzwunhWU4TGKsH_NWDoYwh57MNo&s=dKWX1clbfClbfJb5yKSzhoNC1aqCbT6-7s1DQdx8CzY&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From carlz at us.ibm.com Thu Apr 16 13:44:14 2020 From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com) Date: Thu, 16 Apr 2020 12:44:14 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction Message-ID: Folks, I need to correct a common misunderstanding that is perpetuated here: > As IBM has completely switched to capacity based licensing in order to use SS v5 For new customers, Scale is priced Per TB (we also have Per PB licenses now for convenience). This transition was completed in January 2019. And for ESS, it is licensed Per Drive with different prices for HDDs and SSDs. Existing customers with Standard sockets can remain on and continue to buy more Standard sockets. There is no plan to end that entitlement. The same applies to customers with Advanced sockets who want to continue with Advanced. In both cases you can upgrade from V4.2 to V5.0 without changing your license metric. This licensing change is not connected to the migration from V4 to V5. However, I do see a lot of confusion around this point, including from my IBM colleagues, possibly because both transitions occurred around roughly the same time period. Regards, Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com ? From dean.flanders at fmi.ch Thu Apr 16 14:00:49 2020 From: dean.flanders at fmi.ch (Flanders, Dean) Date: Thu, 16 Apr 2020 13:00:49 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction In-Reply-To: References: Message-ID: Hello Carl, Yes, for existing IBM direct customers that may have been the case for v4 to v5. However, from my understanding if a customer bought GPFS/SS via DDN, Lenovo, etc. with embedded systems licenses, this is not the case. From my understanding existing customers from DDN, Lenovo, etc. that have v4 with socket based licenses are not entitled v5 licenses socket licenses. Is that a correct understanding? Thanks and kind regards, Dean -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Carl Zetie - carlz at us.ibm.com Sent: Thursday, April 16, 2020 2:44 PM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Spectrum Scale licensing - important correction Folks, I need to correct a common misunderstanding that is perpetuated here: > As IBM has completely switched to capacity based licensing in order to > use SS v5 For new customers, Scale is priced Per TB (we also have Per PB licenses now for convenience). This transition was completed in January 2019. And for ESS, it is licensed Per Drive with different prices for HDDs and SSDs. Existing customers with Standard sockets can remain on and continue to buy more Standard sockets. There is no plan to end that entitlement. The same applies to customers with Advanced sockets who want to continue with Advanced. In both cases you can upgrade from V4.2 to V5.0 without changing your license metric. This licensing change is not connected to the migration from V4 to V5. However, I do see a lot of confusion around this point, including from my IBM colleagues, possibly because both transitions occurred around roughly the same time period. Regards, Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com ? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From eric.wonderley at vt.edu Thu Apr 16 17:32:29 2020 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Thu, 16 Apr 2020 12:32:29 -0400 Subject: [gpfsug-discuss] gpfs filesets question Message-ID: I have filesets setup in a filesystem...looks like: [root at cl005 ~]# mmlsfileset home -L Filesets in file system 'home': Name Id RootInode ParentId Created InodeSpace MaxInodes AllocInodes Comment root 0 3 -- Tue Jun 30 07:54:09 2015 0 402653184 320946176 root fileset hess 1 543733376 0 Tue Jun 13 14:56:13 2017 0 0 0 predictHPC 2 1171116 0 Thu Jan 5 15:16:56 2017 0 0 0 HYCCSIM 3 544258049 0 Wed Jun 14 10:00:41 2017 0 0 0 socialdet 4 544258050 0 Wed Jun 14 10:01:02 2017 0 0 0 arc 5 1171073 0 Thu Jan 5 15:07:09 2017 0 0 0 arcadm 6 1171074 0 Thu Jan 5 15:07:10 2017 0 0 0 I beleive these are dependent filesets. Dependent on the root fileset. Anyhow a user wants to move a large amount of data from one fileset to another. Would this be a metadata only operation? He has attempted to small amount of data and has noticed some thrasing. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Thu Apr 16 18:11:40 2020 From: stockf at us.ibm.com (Frederick Stock) Date: Thu, 16 Apr 2020 17:11:40 +0000 Subject: [gpfsug-discuss] gpfs filesets question In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From eric.wonderley at vt.edu Thu Apr 16 18:36:35 2020 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Thu, 16 Apr 2020 13:36:35 -0400 Subject: [gpfsug-discuss] gpfs filesets question In-Reply-To: References: Message-ID: Hi Fred: I do. I have 3 pools. system, ssd data pool(fc_ssd400G) and a spinning disk pool(fc_8T). I want to think the ssd_data_pool is empty at the moment and the system pool is ssd and only contains metadata. [root at cl005 ~]# mmdf home -P fc_ssd400G disk disk size failure holds holds free KB free KB name in KB group metadata data in full blocks in fragments --------------- ------------- -------- -------- ----- -------------------- ------------------- Disks in storage pool: fc_ssd400G (Maximum disk size allowed is 97 TB) r10f1e8 1924720640 1001 No Yes 1924644864 (100%) 9728 ( 0%) r10f1e7 1924720640 1001 No Yes 1924636672 (100%) 17408 ( 0%) r10f1e6 1924720640 1001 No Yes 1924636672 (100%) 17664 ( 0%) r10f1e5 1924720640 1001 No Yes 1924644864 (100%) 9728 ( 0%) r10f6e8 1924720640 1001 No Yes 1924644864 (100%) 9728 ( 0%) r10f1e9 1924720640 1001 No Yes 1924644864 (100%) 9728 ( 0%) r10f6e9 1924720640 1001 No Yes 1924644864 (100%) 9728 ( 0%) ------------- -------------------- ------------------- (pool total) 13473044480 13472497664 (100%) 83712 ( 0%) More or less empty. Interesting... On Thu, Apr 16, 2020 at 1:11 PM Frederick Stock wrote: > Do you have more than one GPFS storage pool in the system? If you do and > they align with the filesets then that might explain why moving data from > one fileset to another is causing increased IO operations. > > Fred > __________________________________________________ > Fred Stock | IBM Pittsburgh Lab | 720-430-8821 > stockf at us.ibm.com > > > > ----- Original message ----- > From: "J. Eric Wonderley" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: [EXTERNAL] [gpfsug-discuss] gpfs filesets question > Date: Thu, Apr 16, 2020 12:32 PM > > I have filesets setup in a filesystem...looks like: > [root at cl005 ~]# mmlsfileset home -L > Filesets in file system 'home': > Name Id RootInode ParentId Created > InodeSpace MaxInodes AllocInodes Comment > root 0 3 -- Tue Jun 30 > 07:54:09 2015 0 402653184 320946176 root fileset > hess 1 543733376 0 Tue Jun 13 > 14:56:13 2017 0 0 0 > predictHPC 2 1171116 0 Thu Jan 5 > 15:16:56 2017 0 0 0 > HYCCSIM 3 544258049 0 Wed Jun 14 > 10:00:41 2017 0 0 0 > socialdet 4 544258050 0 Wed Jun 14 > 10:01:02 2017 0 0 0 > arc 5 1171073 0 Thu Jan 5 > 15:07:09 2017 0 0 0 > arcadm 6 1171074 0 Thu Jan 5 > 15:07:10 2017 0 0 0 > > I beleive these are dependent filesets. Dependent on the root fileset. > Anyhow a user wants to move a large amount of data from one fileset to > another. Would this be a metadata only operation? He has attempted to > small amount of data and has noticed some thrasing. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Thu Apr 16 18:55:09 2020 From: stockf at us.ibm.com (Frederick Stock) Date: Thu, 16 Apr 2020 17:55:09 +0000 Subject: [gpfsug-discuss] gpfs filesets question In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Thu Apr 16 19:25:33 2020 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 16 Apr 2020 18:25:33 +0000 Subject: [gpfsug-discuss] gpfs filesets question In-Reply-To: References: Message-ID: If my memory serves? any move of files between filesets requires data to be moved, regardless of pool allocation for the files that need to be moved, and regardless if they are dependent filesets are both in the same independent fileset. -Bryan From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of J. Eric Wonderley Sent: Thursday, April 16, 2020 12:37 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] gpfs filesets question [EXTERNAL EMAIL] Hi Fred: I do. I have 3 pools. system, ssd data pool(fc_ssd400G) and a spinning disk pool(fc_8T). I want to think the ssd_data_pool is empty at the moment and the system pool is ssd and only contains metadata. [root at cl005 ~]# mmdf home -P fc_ssd400G disk disk size failure holds holds free KB free KB name in KB group metadata data in full blocks in fragments --------------- ------------- -------- -------- ----- -------------------- ------------------- Disks in storage pool: fc_ssd400G (Maximum disk size allowed is 97 TB) r10f1e8 1924720640 1001 No Yes 1924644864 (100%) 9728 ( 0%) r10f1e7 1924720640 1001 No Yes 1924636672 (100%) 17408 ( 0%) r10f1e6 1924720640 1001 No Yes 1924636672 (100%) 17664 ( 0%) r10f1e5 1924720640 1001 No Yes 1924644864 (100%) 9728 ( 0%) r10f6e8 1924720640 1001 No Yes 1924644864 (100%) 9728 ( 0%) r10f1e9 1924720640 1001 No Yes 1924644864 (100%) 9728 ( 0%) r10f6e9 1924720640 1001 No Yes 1924644864 (100%) 9728 ( 0%) ------------- -------------------- ------------------- (pool total) 13473044480 13472497664 (100%) 83712 ( 0%) More or less empty. Interesting... On Thu, Apr 16, 2020 at 1:11 PM Frederick Stock > wrote: Do you have more than one GPFS storage pool in the system? If you do and they align with the filesets then that might explain why moving data from one fileset to another is causing increased IO operations. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com ----- Original message ----- From: "J. Eric Wonderley" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: [EXTERNAL] [gpfsug-discuss] gpfs filesets question Date: Thu, Apr 16, 2020 12:32 PM I have filesets setup in a filesystem...looks like: [root at cl005 ~]# mmlsfileset home -L Filesets in file system 'home': Name Id RootInode ParentId Created InodeSpace MaxInodes AllocInodes Comment root 0 3 -- Tue Jun 30 07:54:09 2015 0 402653184 320946176 root fileset hess 1 543733376 0 Tue Jun 13 14:56:13 2017 0 0 0 predictHPC 2 1171116 0 Thu Jan 5 15:16:56 2017 0 0 0 HYCCSIM 3 544258049 0 Wed Jun 14 10:00:41 2017 0 0 0 socialdet 4 544258050 0 Wed Jun 14 10:01:02 2017 0 0 0 arc 5 1171073 0 Thu Jan 5 15:07:09 2017 0 0 0 arcadm 6 1171074 0 Thu Jan 5 15:07:10 2017 0 0 0 I beleive these are dependent filesets. Dependent on the root fileset. Anyhow a user wants to move a large amount of data from one fileset to another. Would this be a metadata only operation? He has attempted to small amount of data and has noticed some thrasing. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu Apr 16 17:50:42 2020 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 16 Apr 2020 16:50:42 +0000 Subject: [gpfsug-discuss] gpfs filesets question Message-ID: Moving data between filesets is like moving files between file systems. Normally when you move files between directories, it?s simple metadata, but with filesets (dependent or independent) is a full copy and delete of the old data. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of "J. Eric Wonderley" Reply-To: gpfsug main discussion list Date: Thursday, April 16, 2020 at 11:32 AM To: gpfsug main discussion list Subject: [EXTERNAL] [gpfsug-discuss] gpfs filesets question I have filesets setup in a filesystem...looks like: [root at cl005 ~]# mmlsfileset home -L Filesets in file system 'home': Name Id RootInode ParentId Created InodeSpace MaxInodes AllocInodes Comment root 0 3 -- Tue Jun 30 07:54:09 2015 0 402653184 320946176 root fileset hess 1 543733376 0 Tue Jun 13 14:56:13 2017 0 0 0 predictHPC 2 1171116 0 Thu Jan 5 15:16:56 2017 0 0 0 HYCCSIM 3 544258049 0 Wed Jun 14 10:00:41 2017 0 0 0 socialdet 4 544258050 0 Wed Jun 14 10:01:02 2017 0 0 0 arc 5 1171073 0 Thu Jan 5 15:07:09 2017 0 0 0 arcadm 6 1171074 0 Thu Jan 5 15:07:10 2017 0 0 0 I beleive these are dependent filesets. Dependent on the root fileset. Anyhow a user wants to move a large amount of data from one fileset to another. Would this be a metadata only operation? He has attempted to small amount of data and has noticed some thrasing. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carlz at us.ibm.com Thu Apr 16 21:24:51 2020 From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com) Date: Thu, 16 Apr 2020 20:24:51 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction Message-ID: > From my understanding existing customers from DDN, Lenovo, etc. that have v4 with socket based licenses >are not entitled v5 licenses socket licenses. Is that a correct understanding? It is not, and I apologize in advance for the length of this explanation. I want to be precise and as transparent as possible while respecting the confidentiality of our OEM partners and the contracts we have with them, and there is a lot of misinformation out there. The short version is that the same rules apply to DDN, Lenovo, and other OEM systems that apply to IBM ESS. You can update your system in place and keep your existing metric, as long as your vendor can supply you with V5 for that hardware. The update from V4 to V5 is not relevant. The long version: We apply the same standard to our OEM's systems as to our own ESS: they can upgrade their existing customers on their existing OEM systems to V5 and stay on Sockets, *provided* that the OEM has entered into an OEM license for Scale V5 and can supply it, and *provided* that the hardware is still supported by the software stack. But new customers and new OEM systems are all licensed by Capacity. This also applies to IBM's own ESS: you can keep upgrading your old (if hardware is supported) gen 1 ESS on Sockets, but if you replace it with a new ESS, that will come with capacity licenses. (Lenovo may want to chime in about their own GSS customers here, who have Socket licenses, and DSS-G customers, who have Capacity licenses). Existing systems that originally shipped with Socket licenses are "grandfathered in". And of course, if you move from a Lenovo system to an IBM system, or from an IBM system to a Lenovo system, or any other change of suppliers, that new system will come with capacity licenses, simply because it's a new system. If you're replacing an old system running with V4 with a new one running V5 it might look like you are forced to switch to update, but that's not the case: if you replace an old "grandfathered in" system that you had already updated to V5 on Sockets, your new system would *still* come with Capacity licenses - again, because it's a new system. Now where much of the confusion occurs is this: What if your supplier does not provide an update to V5 at all, *neither as Capacity nor Socket licenses*? Then you have no choice: to get to V5, you have to move to a new supplier, and consequently you have to move to Capacity licensing. But once again, it's not that moving from V4 to V5 requires a change of metric; it's moving to a new system from a new supplier. I hope that helps to make things clearer. Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com From mhennecke at lenovo.com Thu Apr 16 22:19:13 2020 From: mhennecke at lenovo.com (Michael Hennecke) Date: Thu, 16 Apr 2020 21:19:13 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing - Lenovo information Message-ID: Hi, Thanks a lot Carl for these clarifications. Some additions from the Lenovo side: Lenovo *GSS* (which is no longer sold, but still fully supported) uses the socked-based Spectrum Scale Standard Edition or Advanced Edition. We provide both a 4.2 based version and a 5.0 based version of the GSS installation packages. Customers get access to the Edition they acquired with their GSS system(s), and they can choose to install the 4.2 or the 5.0 code. Lenovo GSS customers are automatically entitled for those GSS downloads. Customers who acquired a GSS system when System x was still part of IBM can also obtain the latest GSS installation packages from Lenovo (v4 and v5), but will need to provide a valid proof of entitlement of their Spectrum Scale licenses before being granted access. Lenovo *DSS-G* uses capacity-based licensing (per-disk or per-TB), with the Spectrum Scale Data Access Edition or Data Management Edition. For DSS-G we also provide both a 4.2 based installation package and a 5.0 based installation package, and customers can choose which one to install. Note that the Lenovo installation tarballs for DSS-G are named for example "dss-g-2.6a-standard-5.0.tgz" (installation package includes the capacity-based DAE) or "dss-g-2.6a-advanced-5.0.tgz" (installation package includes the capacity-based DME), so the Lenovo naming convention for the DSS-G packages is not identical with the naming of the Scale Edition that it includes. PS: There is no path to change a GSS system from a socket license to a capacity license. Replacing it with a DSS-G will of course also replace the licenses, as DSS-G comes with capacity-based licenses. Mit freundlichen Gr?ssen / Best regards, Michael Hennecke HPC Chief Technologist - HPC and AI Business Unit? -- Lenovo Global Technology (Germany) GmbH * Am Zehnthof 77 * D-45307 Essen * Germany Gesch?ftsf?hrung: Colm Gleeson, Christophe Laurent * Sitz der Gesellschaft: Stuttgart * HRB-Nr.: 758298, AG Stuttgart -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Carl Zetie - carlz at us.ibm.com Sent: Thursday, April 16, 2020 10:25 PM To: gpfsug-discuss at spectrumscale.org Subject: [External] Re: [gpfsug-discuss] Spectrum Scale licensing - important correction > From my understanding existing customers from DDN, Lenovo, etc. that >have v4 with socket based licenses are not entitled v5 licenses socket licenses. Is that a correct understanding? It is not, and I apologize in advance for the length of this explanation. I want to be precise and as transparent as possible while respecting the confidentiality of our OEM partners and the contracts we have with them, and there is a lot of misinformation out there. The short version is that the same rules apply to DDN, Lenovo, and other OEM systems that apply to IBM ESS. You can update your system in place and keep your existing metric, as long as your vendor can supply you with V5 for that hardware. The update from V4 to V5 is not relevant. The long version: We apply the same standard to our OEM's systems as to our own ESS: they can upgrade their existing customers on their existing OEM systems to V5 and stay on Sockets, *provided* that the OEM has entered into an OEM license for Scale V5 and can supply it, and *provided* that the hardware is still supported by the software stack. But new customers and new OEM systems are all licensed by Capacity. This also applies to IBM's own ESS: you can keep upgrading your old (if hardware is supported) gen 1 ESS on Sockets, but if you replace it with a new ESS, that will come with capacity licenses. (Lenovo may want to chime in about their own GSS customers here, who have Socket licenses, and DSS-G customers, who have Capacity licenses). Existing systems that originally shipped with Socket licenses are "grandfathered in". And of course, if you move from a Lenovo system to an IBM system, or from an IBM system to a Lenovo system, or any other change of suppliers, that new system will come with capacity licenses, simply because it's a new system. If you're replacing an old system running with V4 with a new one running V5 it might look like you are forced to switch to update, but that's not the case: if you replace an old "grandfathered in" system that you had already updated to V5 on Sockets, your new system would *still* come with Capacity licenses - again, because it's a new system. Now where much of the confusion occurs is this: What if your supplier does not provide an update to V5 at all, *neither as Capacity nor Socket licenses*? Then you have no choice: to get to V5, you have to move to a new supplier, and consequently you have to move to Capacity licensing. But once again, it's not that moving from V4 to V5 requires a change of metric; it's moving to a new system from a new supplier. I hope that helps to make things clearer. Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Greg.Lehmann at csiro.au Fri Apr 17 00:48:18 2020 From: Greg.Lehmann at csiro.au (Lehmann, Greg (IM&T, Pullenvale)) Date: Thu, 16 Apr 2020 23:48:18 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing In-Reply-To: References: Message-ID: Plus one. It is not just volume licensing. The socket licensing costs have gone through the roof, at least in Australia. IBM tempts you with a cheap introduction and then once you are hooked, ramps up the price. They are counting on the migration costs outweighing the licensing fee increases. Unfortunately, our management won't stand for this business approach, so we get to do the migrations (boring as the proverbial bat ... you know what.) I think this forum is a good place to discuss it. IBM and customers on here need to know all about it. It is a user group after all and moving away from a product is part of the lifecycle. We were going to use GPFS for HPC scratch but went to market and ended up with BeeGFS. Further pricing pressure has meant GPFS is being phased out in all areas. We split our BeeGFS cluster of NVMe servers in half on arrival and have been trying other filesystems on half of it. We were going to try GPFS ECE but given the pricing we have been quoted have decided not to waste our time. We are gearing up to try Lustre on it. We have also noted the feature improvements with Lustre. Maybe if IBM had saved the money that a rebranding costs (GPFS to Spectrum Scale) they would not have had to crank up the price of GPFS? Cheers, Greg From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Flanders, Dean Sent: Thursday, April 16, 2020 1:27 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Spectrum Scale licensing Hello All, As IBM has completely switched to capacity based licensing in order to use SS v5 I was wondering how others are dealing with this? We do not find the capacity based licensing sustainable. Our long term plan is to migrate away from SS v5 to Lustre, and based on the Lustre roadmap we have seen it should have the features we need within the next ~1 year (we are fortunate to have good contacts). We would really like to stay with SS/GPFS and have been big advocates of SS/GPFS over the years, but the capacity based licensing is pushing us into evaluating alternatives. I realize this may not be proper to discuss this directly in this email list, so feel free to email directly with your suggestions or your plans. Thanks and kind regards, Dean -------------- next part -------------- An HTML attachment was scrubbed... URL: From dean.flanders at fmi.ch Fri Apr 17 01:40:22 2020 From: dean.flanders at fmi.ch (Flanders, Dean) Date: Fri, 17 Apr 2020 00:40:22 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction In-Reply-To: References: Message-ID: Hello Carl, Thanks for the clarification. I have always heard the term "existing customers" so originally I thought we were fine, but this is the first time I have seen the term "existing systems". However, it seems what I said before is mostly correct, eventually all customers will be forced to capacity based licensing as they life cycle hardware (even IBM customers). In addition it seems there is a diminishing number of OEMs that can sell SS v5, which is what happened in our case when we wanted to go from v4 to v5 with existing hardware (in our case DDN). So I strongly encourage organizations to be thinking of these issues in their long term planning. Thanks and kind regards, Dean -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Carl Zetie - carlz at us.ibm.com Sent: Thursday, April 16, 2020 10:25 PM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Spectrum Scale licensing - important correction > From my understanding existing customers from DDN, Lenovo, etc. that >have v4 with socket based licenses are not entitled v5 licenses socket licenses. Is that a correct understanding? It is not, and I apologize in advance for the length of this explanation. I want to be precise and as transparent as possible while respecting the confidentiality of our OEM partners and the contracts we have with them, and there is a lot of misinformation out there. The short version is that the same rules apply to DDN, Lenovo, and other OEM systems that apply to IBM ESS. You can update your system in place and keep your existing metric, as long as your vendor can supply you with V5 for that hardware. The update from V4 to V5 is not relevant. The long version: We apply the same standard to our OEM's systems as to our own ESS: they can upgrade their existing customers on their existing OEM systems to V5 and stay on Sockets, *provided* that the OEM has entered into an OEM license for Scale V5 and can supply it, and *provided* that the hardware is still supported by the software stack. But new customers and new OEM systems are all licensed by Capacity. This also applies to IBM's own ESS: you can keep upgrading your old (if hardware is supported) gen 1 ESS on Sockets, but if you replace it with a new ESS, that will come with capacity licenses. (Lenovo may want to chime in about their own GSS customers here, who have Socket licenses, and DSS-G customers, who have Capacity licenses). Existing systems that originally shipped with Socket licenses are "grandfathered in". And of course, if you move from a Lenovo system to an IBM system, or from an IBM system to a Lenovo system, or any other change of suppliers, that new system will come with capacity licenses, simply because it's a new system. If you're replacing an old system running with V4 with a new one running V5 it might look like you are forced to switch to update, but that's not the case: if you replace an old "grandfathered in" system that you had already updated to V5 on Sockets, your new system would *still* come with Capacity licenses - again, because it's a new system. Now where much of the confusion occurs is this: What if your supplier does not provide an update to V5 at all, *neither as Capacity nor Socket licenses*? Then you have no choice: to get to V5, you have to move to a new supplier, and consequently you have to move to Capacity licensing. But once again, it's not that moving from V4 to V5 requires a change of metric; it's moving to a new system from a new supplier. I hope that helps to make things clearer. Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From sedl at re-store.net Fri Apr 17 03:06:57 2020 From: sedl at re-store.net (Michael Sedlmayer) Date: Fri, 17 Apr 2020 02:06:57 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction In-Reply-To: References: Message-ID: One more important distinction with the DDN installations. Most DDN systems were deployed with an OEM license of GPFS v4. That license allowed DDN to use GPFS on their hardware appliance, but and didn't ever equate to an IBM software license. To my knowledge, DDN has not been a reseller of IBM licenses. We've had a lot of issues where our DDN users wanted to upgrade to Spectrum Scale 5; DDN couldn't provide the licensed code; and the user learned that they really didn't own IBM software (just the right to use the software on their DDN system) -michael Michael Sedlmayer -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Flanders, Dean Sent: Thursday, April 16, 2020 5:40 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Spectrum Scale licensing - important correction Hello Carl, Thanks for the clarification. I have always heard the term "existing customers" so originally I thought we were fine, but this is the first time I have seen the term "existing systems". However, it seems what I said before is mostly correct, eventually all customers will be forced to capacity based licensing as they life cycle hardware (even IBM customers). In addition it seems there is a diminishing number of OEMs that can sell SS v5, which is what happened in our case when we wanted to go from v4 to v5 with existing hardware (in our case DDN). So I strongly encourage organizations to be thinking of these issues in their long term planning. Thanks and kind regards, Dean -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Carl Zetie - carlz at us.ibm.com Sent: Thursday, April 16, 2020 10:25 PM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Spectrum Scale licensing - important correction > From my understanding existing customers from DDN, Lenovo, etc. that >have v4 with socket based licenses are not entitled v5 licenses socket licenses. Is that a correct understanding? It is not, and I apologize in advance for the length of this explanation. I want to be precise and as transparent as possible while respecting the confidentiality of our OEM partners and the contracts we have with them, and there is a lot of misinformation out there. The short version is that the same rules apply to DDN, Lenovo, and other OEM systems that apply to IBM ESS. You can update your system in place and keep your existing metric, as long as your vendor can supply you with V5 for that hardware. The update from V4 to V5 is not relevant. The long version: We apply the same standard to our OEM's systems as to our own ESS: they can upgrade their existing customers on their existing OEM systems to V5 and stay on Sockets, *provided* that the OEM has entered into an OEM license for Scale V5 and can supply it, and *provided* that the hardware is still supported by the software stack. But new customers and new OEM systems are all licensed by Capacity. This also applies to IBM's own ESS: you can keep upgrading your old (if hardware is supported) gen 1 ESS on Sockets, but if you replace it with a new ESS, that will come with capacity licenses. (Lenovo may want to chime in about their own GSS customers here, who have Socket licenses, and DSS-G customers, who have Capacity licenses). Existing systems that originally shipped with Socket licenses are "grandfathered in". And of course, if you move from a Lenovo system to an IBM system, or from an IBM system to a Lenovo system, or any other change of suppliers, that new system will come with capacity licenses, simply because it's a new system. If you're replacing an old system running with V4 with a new one running V5 it might look like you are forced to switch to update, but that's not the case: if you replace an old "grandfathered in" system that you had already updated to V5 on Sockets, your new system would *still* come with Capacity licenses - again, because it's a new system. Now where much of the confusion occurs is this: What if your supplier does not provide an update to V5 at all, *neither as Capacity nor Socket licenses*? Then you have no choice: to get to V5, you have to move to a new supplier, and consequently you have to move to Capacity licensing. But once again, it's not that moving from V4 to V5 requires a change of metric; it's moving to a new system from a new supplier. I hope that helps to make things clearer. Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From steve.hindmarsh at crick.ac.uk Fri Apr 17 08:35:51 2020 From: steve.hindmarsh at crick.ac.uk (Steve Hindmarsh) Date: Fri, 17 Apr 2020 07:35:51 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction In-Reply-To: References: , Message-ID: <26D9CF79-4118-4F5A-AF96-CA72AA1EE712@crick.ac.uk> We are caught in the same position (12 PB on DDN GridScaler) and currently unable to upgrade to v5. If the position between IBM and DDN can?t be resolved, an extension of meaningful support from IBM (i.e. critical patches not just a sympathetic ear) for OEM licences would make a *huge* difference to those of us who need to provide critical production research data services on current equipment for another few years at least - with appropriate paid vendor support of course. Best, Steve Steve Hindmarsh Head of Scientific Computing The Francis Crick Institute Sent from my mobile On 17 Apr 2020, at 03:07, Michael Sedlmayer wrote: ?One more important distinction with the DDN installations. Most DDN systems were deployed with an OEM license of GPFS v4. That license allowed DDN to use GPFS on their hardware appliance, but and didn't ever equate to an IBM software license. To my knowledge, DDN has not been a reseller of IBM licenses. We've had a lot of issues where our DDN users wanted to upgrade to Spectrum Scale 5; DDN couldn't provide the licensed code; and the user learned that they really didn't own IBM software (just the right to use the software on their DDN system) -michael Michael Sedlmayer -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Flanders, Dean Sent: Thursday, April 16, 2020 5:40 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Spectrum Scale licensing - important correction Hello Carl, Thanks for the clarification. I have always heard the term "existing customers" so originally I thought we were fine, but this is the first time I have seen the term "existing systems". However, it seems what I said before is mostly correct, eventually all customers will be forced to capacity based licensing as they life cycle hardware (even IBM customers). In addition it seems there is a diminishing number of OEMs that can sell SS v5, which is what happened in our case when we wanted to go from v4 to v5 with existing hardware (in our case DDN). So I strongly encourage organizations to be thinking of these issues in their long term planning. Thanks and kind regards, Dean -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Carl Zetie - carlz at us.ibm.com Sent: Thursday, April 16, 2020 10:25 PM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Spectrum Scale licensing - important correction From my understanding existing customers from DDN, Lenovo, etc. that have v4 with socket based licenses are not entitled v5 licenses socket licenses. Is that a correct understanding? It is not, and I apologize in advance for the length of this explanation. I want to be precise and as transparent as possible while respecting the confidentiality of our OEM partners and the contracts we have with them, and there is a lot of misinformation out there. The short version is that the same rules apply to DDN, Lenovo, and other OEM systems that apply to IBM ESS. You can update your system in place and keep your existing metric, as long as your vendor can supply you with V5 for that hardware. The update from V4 to V5 is not relevant. The long version: We apply the same standard to our OEM's systems as to our own ESS: they can upgrade their existing customers on their existing OEM systems to V5 and stay on Sockets, *provided* that the OEM has entered into an OEM license for Scale V5 and can supply it, and *provided* that the hardware is still supported by the software stack. But new customers and new OEM systems are all licensed by Capacity. This also applies to IBM's own ESS: you can keep upgrading your old (if hardware is supported) gen 1 ESS on Sockets, but if you replace it with a new ESS, that will come with capacity licenses. (Lenovo may want to chime in about their own GSS customers here, who have Socket licenses, and DSS-G customers, who have Capacity licenses). Existing systems that originally shipped with Socket licenses are "grandfathered in". And of course, if you move from a Lenovo system to an IBM system, or from an IBM system to a Lenovo system, or any other change of suppliers, that new system will come with capacity licenses, simply because it's a new system. If you're replacing an old system running with V4 with a new one running V5 it might look like you are forced to switch to update, but that's not the case: if you replace an old "grandfathered in" system that you had already updated to V5 on Sockets, your new system would *still* come with Capacity licenses - again, because it's a new system. Now where much of the confusion occurs is this: What if your supplier does not provide an update to V5 at all, *neither as Capacity nor Socket licenses*? Then you have no choice: to get to V5, you have to move to a new supplier, and consequently you have to move to Capacity licensing. But once again, it's not that moving from V4 to V5 requires a change of metric; it's moving to a new system from a new supplier. I hope that helps to make things clearer. Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7C%7C75b2fc2faa9347d0fcbd08d7e274083b%7C4eed7807ebad415aa7a99170947f4eae%7C0%7C1%7C637226860297343792&sdata=CSG%2FpHcDNU3ZAyp2hdI4oZXCO20UtUmUuzLe05uqRiI%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7C%7C75b2fc2faa9347d0fcbd08d7e274083b%7C4eed7807ebad415aa7a99170947f4eae%7C0%7C1%7C637226860297343792&sdata=CSG%2FpHcDNU3ZAyp2hdI4oZXCO20UtUmUuzLe05uqRiI%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7C%7C75b2fc2faa9347d0fcbd08d7e274083b%7C4eed7807ebad415aa7a99170947f4eae%7C0%7C1%7C637226860297343792&sdata=CSG%2FpHcDNU3ZAyp2hdI4oZXCO20UtUmUuzLe05uqRiI%3D&reserved=0 The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Fri Apr 17 09:19:39 2020 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Fri, 17 Apr 2020 08:19:39 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction In-Reply-To: <26D9CF79-4118-4F5A-AF96-CA72AA1EE712@crick.ac.uk> References: , , <26D9CF79-4118-4F5A-AF96-CA72AA1EE712@crick.ac.uk> Message-ID: Especially with the pandemic. No one is exactly sure what next year?s budget is going to look like. I wouldn?t expect to be buying large amounts of storage to replace so far perfectly good storage. -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Apr 17, 2020, at 03:36, Steve Hindmarsh wrote: ? We are caught in the same position (12 PB on DDN GridScaler) and currently unable to upgrade to v5. If the position between IBM and DDN can?t be resolved, an extension of meaningful support from IBM (i.e. critical patches not just a sympathetic ear) for OEM licences would make a *huge* difference to those of us who need to provide critical production research data services on current equipment for another few years at least - with appropriate paid vendor support of course. Best, Steve Steve Hindmarsh Head of Scientific Computing The Francis Crick Institute Sent from my mobile On 17 Apr 2020, at 03:07, Michael Sedlmayer wrote: ?One more important distinction with the DDN installations. Most DDN systems were deployed with an OEM license of GPFS v4. That license allowed DDN to use GPFS on their hardware appliance, but and didn't ever equate to an IBM software license. To my knowledge, DDN has not been a reseller of IBM licenses. We've had a lot of issues where our DDN users wanted to upgrade to Spectrum Scale 5; DDN couldn't provide the licensed code; and the user learned that they really didn't own IBM software (just the right to use the software on their DDN system) -michael Michael Sedlmayer -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Flanders, Dean Sent: Thursday, April 16, 2020 5:40 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Spectrum Scale licensing - important correction Hello Carl, Thanks for the clarification. I have always heard the term "existing customers" so originally I thought we were fine, but this is the first time I have seen the term "existing systems". However, it seems what I said before is mostly correct, eventually all customers will be forced to capacity based licensing as they life cycle hardware (even IBM customers). In addition it seems there is a diminishing number of OEMs that can sell SS v5, which is what happened in our case when we wanted to go from v4 to v5 with existing hardware (in our case DDN). So I strongly encourage organizations to be thinking of these issues in their long term planning. Thanks and kind regards, Dean -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Carl Zetie - carlz at us.ibm.com Sent: Thursday, April 16, 2020 10:25 PM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Spectrum Scale licensing - important correction From my understanding existing customers from DDN, Lenovo, etc. that have v4 with socket based licenses are not entitled v5 licenses socket licenses. Is that a correct understanding? It is not, and I apologize in advance for the length of this explanation. I want to be precise and as transparent as possible while respecting the confidentiality of our OEM partners and the contracts we have with them, and there is a lot of misinformation out there. The short version is that the same rules apply to DDN, Lenovo, and other OEM systems that apply to IBM ESS. You can update your system in place and keep your existing metric, as long as your vendor can supply you with V5 for that hardware. The update from V4 to V5 is not relevant. The long version: We apply the same standard to our OEM's systems as to our own ESS: they can upgrade their existing customers on their existing OEM systems to V5 and stay on Sockets, *provided* that the OEM has entered into an OEM license for Scale V5 and can supply it, and *provided* that the hardware is still supported by the software stack. But new customers and new OEM systems are all licensed by Capacity. This also applies to IBM's own ESS: you can keep upgrading your old (if hardware is supported) gen 1 ESS on Sockets, but if you replace it with a new ESS, that will come with capacity licenses. (Lenovo may want to chime in about their own GSS customers here, who have Socket licenses, and DSS-G customers, who have Capacity licenses). Existing systems that originally shipped with Socket licenses are "grandfathered in". And of course, if you move from a Lenovo system to an IBM system, or from an IBM system to a Lenovo system, or any other change of suppliers, that new system will come with capacity licenses, simply because it's a new system. If you're replacing an old system running with V4 with a new one running V5 it might look like you are forced to switch to update, but that's not the case: if you replace an old "grandfathered in" system that you had already updated to V5 on Sockets, your new system would *still* come with Capacity licenses - again, because it's a new system. Now where much of the confusion occurs is this: What if your supplier does not provide an update to V5 at all, *neither as Capacity nor Socket licenses*? Then you have no choice: to get to V5, you have to move to a new supplier, and consequently you have to move to Capacity licensing. But once again, it's not that moving from V4 to V5 requires a change of metric; it's moving to a new system from a new supplier. I hope that helps to make things clearer. Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7C%7C75b2fc2faa9347d0fcbd08d7e274083b%7C4eed7807ebad415aa7a99170947f4eae%7C0%7C1%7C637226860297343792&sdata=CSG%2FpHcDNU3ZAyp2hdI4oZXCO20UtUmUuzLe05uqRiI%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7C%7C75b2fc2faa9347d0fcbd08d7e274083b%7C4eed7807ebad415aa7a99170947f4eae%7C0%7C1%7C637226860297343792&sdata=CSG%2FpHcDNU3ZAyp2hdI4oZXCO20UtUmUuzLe05uqRiI%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7C%7C75b2fc2faa9347d0fcbd08d7e274083b%7C4eed7807ebad415aa7a99170947f4eae%7C0%7C1%7C637226860297343792&sdata=CSG%2FpHcDNU3ZAyp2hdI4oZXCO20UtUmUuzLe05uqRiI%3D&reserved=0 The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.horton at icr.ac.uk Fri Apr 17 10:29:52 2020 From: robert.horton at icr.ac.uk (Robert Horton) Date: Fri, 17 Apr 2020 09:29:52 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction In-Reply-To: <26D9CF79-4118-4F5A-AF96-CA72AA1EE712@crick.ac.uk> References: , <26D9CF79-4118-4F5A-AF96-CA72AA1EE712@crick.ac.uk> Message-ID: We're in the same boat. I'm not sure what the issue is between DDN and IBM (although I've heard various rumors) but I really wish they would sort something out. Rob On Fri, 2020-04-17 at 07:35 +0000, Steve Hindmarsh wrote: > CAUTION: This email originated from outside of the ICR. Do not click > links or open attachments unless you recognize the sender's email > address and know the content is safe. > > We are caught in the same position (12 PB on DDN GridScaler) and > currently unable to upgrade to v5. > > If the position between IBM and DDN can?t be resolved, an extension > of meaningful support from IBM (i.e. critical patches not just a > sympathetic ear) for OEM licences would make a *huge* difference to > those of us who need to provide critical production research data > services on current equipment for another few years at least - with > appropriate paid vendor support of course. > > Best, > Steve > > Steve Hindmarsh > Head of Scientific Computing > The Francis Crick Institute -- Robert Horton | Research Data Storage Lead The Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB T +44 (0)20 7153 5350 | E robert.horton at icr.ac.uk | W www.icr.ac.uk | Twitter @ICR_London Facebook: www.facebook.com/theinstituteofcancerresearch The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network. From yeep at robust.my Fri Apr 17 11:31:49 2020 From: yeep at robust.my (T.A. Yeep) Date: Fri, 17 Apr 2020 18:31:49 +0800 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction In-Reply-To: References: Message-ID: Hi Carl, I'm confused here, in the previous email it was said *And for ESS, it is licensed Per Drive with different prices for HDDs and SSDs.* But then you mentioned in below email that: But new customers and new OEM systems are *all licensed by Capacity. This also applies to IBM's own ESS*: you can keep upgrading your old (if hardware is supported) gen 1 ESS on Sockets, but if you replace it with *a new ESS, that will come with capacity licenses*. Now the question, ESS is license per Drive or by capacity? .On Fri, Apr 17, 2020 at 4:25 AM Carl Zetie - carlz at us.ibm.com < carlz at us.ibm.com> wrote: > > From my understanding existing customers from DDN, Lenovo, etc. that > have v4 with socket based licenses > >are not entitled v5 licenses socket licenses. Is that a correct > understanding? > > It is not, and I apologize in advance for the length of this explanation. > I want to be precise and as transparent as possible while respecting the > confidentiality of our OEM partners and the contracts we have with them, > and there is a lot of misinformation out there. > > The short version is that the same rules apply to DDN, Lenovo, and other > OEM systems that apply to IBM ESS. You can update your system in place and > keep your existing metric, as long as your vendor can supply you with V5 > for that hardware. The update from V4 to V5 is not relevant. > > > The long version: > > We apply the same standard to our OEM's systems as to our own ESS: they > can upgrade their existing customers on their existing OEM systems to V5 > and stay on Sockets, *provided* that the OEM has entered into an OEM > license for Scale V5 and can supply it, and *provided* that the hardware is > still supported by the software stack. But new customers and new OEM > systems are all licensed by Capacity. This also applies to IBM's own ESS: > you can keep upgrading your old (if hardware is supported) gen 1 ESS on > Sockets, but if you replace it with a new ESS, that will come with capacity > licenses. (Lenovo may want to chime in about their own GSS customers here, > who have Socket licenses, and DSS-G customers, who have Capacity licenses). > Existing systems that originally shipped with Socket licenses are > "grandfathered in". > > And of course, if you move from a Lenovo system to an IBM system, or from > an IBM system to a Lenovo system, or any other change of suppliers, that > new system will come with capacity licenses, simply because it's a new > system. If you're replacing an old system running with V4 with a new one > running V5 it might look like you are forced to switch to update, but > that's not the case: if you replace an old "grandfathered in" system that > you had already updated to V5 on Sockets, your new system would *still* > come with Capacity licenses - again, because it's a new system. > > Now where much of the confusion occurs is this: What if your supplier does > not provide an update to V5 at all, *neither as Capacity nor Socket > licenses*? Then you have no choice: to get to V5, you have to move to a new > supplier, and consequently you have to move to Capacity licensing. But once > again, it's not that moving from V4 to V5 requires a change of metric; it's > moving to a new system from a new supplier. > > I hope that helps to make things clearer. > > > > Carl Zetie > Program Director > Offering Management > Spectrum Scale > ---- > (919) 473 3318 ][ Research Triangle Park > carlz at us.ibm.com > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Best regards *T.A. Yeep*Mobile: 016-719 8506 | Tel/Fax: 03-6261 7237 | www.robusthpc.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Fri Apr 17 11:50:22 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 17 Apr 2020 11:50:22 +0100 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction In-Reply-To: References: Message-ID: On 17/04/2020 11:31, T.A. Yeep wrote: > Hi Carl, > > I'm confused here, in the previous email it was said *And for ESS, it is > licensed?Per Drive with different prices for HDDs and SSDs.* > > But then you mentioned in below email that: > But new customers and new OEM systems are *all licensed by Capacity. > This also applies to IBM's own ESS*: you can keep upgrading your old (if > hardware is supported) gen 1 ESS on Sockets, but if you replace it with > *a new ESS, that will come with capacity licenses*. > > Now the question, ESS is license per Drive or by capacity? > Well by drive is "capacity" based licensing unless you have some sort of magical infinite capacity drives :-) Under the PVU scheme if you know what you are doing you could game the system. For example get a handful of servers get PVU licenses for them create a GPFS file system handing off the back using say Fibre Channel and cheap FC attached arrays (Dell MD3000 series springs to mind) and then hang many PB off the back. I could using this scheme create a 100PB filesystem for under a thousand PVU of GPFS server licenses. Add in another cluster for protocol nodes and if you are not mounting on HPC nodes that's a winner :-) In a similar manner I use a pimped out ancient Dell R300 with dual core Xeon for backing up my GPFS filesystem because it's 100PVU of TSM licensing and I am cheap, and besides it is more than enough grunt for the job. A new machine would be 240 PVU minimum (4*70). I plan on replacing the PERC SAS6 card with a H710 and new internal cabling to run RHEL8 :-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From jonathan.buzzard at strath.ac.uk Fri Apr 17 12:02:44 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 17 Apr 2020 12:02:44 +0100 Subject: [gpfsug-discuss] Spectrum Scale licensing In-Reply-To: References: Message-ID: On 16/04/2020 04:26, Flanders, Dean wrote: > Hello All, > > As IBM has completely switched to capacity based licensing in order to > use SS v5 I was wondering how others are dealing with this? We do not > find the capacity based licensing sustainable. Our long term plan is to > migrate away from SS v5 to Lustre, and based on the Lustre roadmap we > have seen it should have the features we need within the next ~1 year > (we are fortunate to have good contacts). The problem is the features of Lustre that are missing in GPFS :-) For example have they removed the Lustre feature where roughly biannually the metadata server kernel panics introducing incorrectable corruption into the file system that will within six months cause constant crashes of the metadata node to the point where the file system is unusable? In best slashdot car analogy GPFS is like driving round in a Aston Martin DB9, where Lustre is like having a Ford Pinto. You will never be happy with Pinto in my experience having gone from the DB9 to the Pinto and back to the DB9. That said if you use Lustre as a high performance scratch file system fro HPC and every ~6 months do a shutdown and upgrade, and at the same time reformat your Lustre file system you will be fine. Our experience with Lustre was so bad we specifically excluded it as an option for our current HPC system when it went out to tender. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From carlz at us.ibm.com Fri Apr 17 13:10:00 2020 From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com) Date: Fri, 17 Apr 2020 12:10:00 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction Message-ID: <19F21F2C-901E-4A04-AB94-740E2C2B5205@us.ibm.com> >Now the question, ESS is license per Drive or by capacity? I apologize for the confusion. Within IBM Storage when we say ?capacity? licensing we use that as an umbrella term for both Per TB/PB *or* Per Drive (HDD or SSD). This is contrasted with ?processor? metrics including Socket and the even older PVU licensing. And yes, we IBMers should be more careful about our tendency to use terminology that nobody else in the world does. (Don?t get me started on terabyte versus tebibyte?). So, for the sake of completeness and for anybody reviewing the thread in the future: * Per Drive is available with ESS, Lenovo DSS, and a number of other OEM solutions*. * Per TB/Per PB is available for software defined storage, including some OEM solutions - basically anywhere where figuring out the number of physical drives is infeasible.** * You can if you wish license ESS with Per TB/PB, for example if you want to have a single pool of licensing across an environment that mixes software-defined, ESS, or public cloud; or if you want to include your ESS licenses in an ELA. This is almost always more expensive than Per Drive, but some customers are willing to pay for the privilege of the flexibility. I hope that helps. *(In some cases the customer may not even know it because the OEM solution is sold as a whole with a bottom line price, and the customer does not see a line item price for Scale. In at least one case, the vertical market solution doesn?t even expose the fact that the storage is provided by Scale.) **(Imagine trying to figure out the ?real? number of drives in a high-end storage array that does RAIDing, hides some drives as spares, offers thin provisioning, etc. Or on public cloud where the ?drives? are all virtual.) Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com [signature_1886717044] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 69558 bytes Desc: image001.png URL: From carlz at us.ibm.com Fri Apr 17 13:16:38 2020 From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com) Date: Fri, 17 Apr 2020 12:16:38 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction Message-ID: <48C781AA-BF81-4E8B-A290-C55A0C322DD4@us.ibm.com> Rob Horton wrote: >I'm not sure what the issue is between DDN and IBM (although I've heard various rumors) >but I really wish they would sort something out. Yes, it?s a pain. IBM and DDN are trying very hard to work something out, but it?s hard to get all the ?I?s dotted and ?T?s crossed with our respective legal and exec reviewers so that when we do say something it will be complete, clear, and final; and not require long, baroque threads for people to figure out where exactly they are? I wish I could say more, but I need to respect the confidentiality of the relationship and the live discussion. In the meantime, I thank you for your patience, and ask that you not believe any rumors you might hear, because whatever they are, they are wrong (or at least incomplete). In this situation, as a wise man once observed, ?those who Say don?t Know; those who Know don?t Say?. Regards, Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com [signature_749317756] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 69558 bytes Desc: image001.png URL: From aaron.knister at gmail.com Fri Apr 17 14:15:07 2020 From: aaron.knister at gmail.com (Aaron Knister) Date: Fri, 17 Apr 2020 09:15:07 -0400 Subject: [gpfsug-discuss] Spectrum Scale licensing In-Reply-To: References: Message-ID: Yeah, I had similar experiences in the past (over a decade ago) with Lustre and was heavily heavily anti-Lustre. That said, I just finished several weeks of what I?d call grueling testing of DDN Lustre and GPFS on the same hardware and I?m reasonably convinced much of that is behind us now (things like stability, metadata performance, random I/O performance just don?t appear to be issues anymore and in some cases these operations are now faster in Lustre). Full disclosure, I work for DDN, but the source of my paycheck has relatively little bearing on my technical opinions. All I?m saying is for me to honestly believe Lustre is worth another shot after the experiences I had years ago is significant. I do think it?s key to have a vendor behind you, vs rolling your own. I have seen that make a difference. I?m happy to take any further conversation/questions offline, I?m in no way trying to turn this into a marketing campaign. Sent from my iPhone > On Apr 17, 2020, at 07:02, Jonathan Buzzard wrote: > > ?On 16/04/2020 04:26, Flanders, Dean wrote: >> Hello All, >> As IBM has completely switched to capacity based licensing in order to use SS v5 I was wondering how others are dealing with this? We do not find the capacity based licensing sustainable. Our long term plan is to migrate away from SS v5 to Lustre, and based on the Lustre roadmap we have seen it should have the features we need within the next ~1 year (we are fortunate to have good contacts). > > The problem is the features of Lustre that are missing in GPFS :-) > > For example have they removed the Lustre feature where roughly biannually the metadata server kernel panics introducing incorrectable corruption into the file system that will within six months cause constant crashes of the metadata node to the point where the file system is unusable? > > In best slashdot car analogy GPFS is like driving round in a Aston Martin DB9, where Lustre is like having a Ford Pinto. You will never be happy with Pinto in my experience having gone from the DB9 to the Pinto and back to the DB9. > > That said if you use Lustre as a high performance scratch file system fro HPC and every ~6 months do a shutdown and upgrade, and at the same time reformat your Lustre file system you will be fine. > > Our experience with Lustre was so bad we specifically excluded it as an option for our current HPC system when it went out to tender. > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From carlz at us.ibm.com Fri Apr 17 14:15:07 2020 From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com) Date: Fri, 17 Apr 2020 13:15:07 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction Message-ID: <82819CD0-0BF7-41A6-9896-32AF88744D4B@us.ibm.com> Dean Flanders: > Thanks for the clarification. I have always heard the term "existing customers" so originally I thought we were fine, > but this is the first time I have seen the term "existing systems". However, it seems what I said before is mostly correct, > eventually all customers will be forced to capacity based licensing as they life cycle hardware (even IBM customers). > In addition it seems there is a diminishing number of OEMs that can sell SS v5, which is what happened in our case when > we wanted to go from v4 to v5 with existing hardware (in our case DDN). So I strongly encourage organizations to be thinking > of these issues in their long term planning. Again, this isn?t quite correct, and I really want the archive of this thread to be completely correct when people review it in the future. As an existing customer of DDN, the problem GridScaler customers in particular are facing is not Sockets vs. Capacity. It is simply that DDN is not an OEM licensee for Scale V5. So DDN cannot upgrade your GridScaler to V5, *neither on Sockets nor on Capacity*. Then if you go to another supplier for V5, you are a new customer to that supplier. (Some of you out there are, I know, multi-sourcing your Scale systems, so may be an ?existing customer? of several Scale suppliers). And again, it is not correct that eventually all customers will be forced to capacity licensing. Those of you on Scale Standard and Scale Advanced software, which are not tied to specific systems or hardware, can continue on those licenses. There is no plan to require those people to migrate. By contrast, OEM licenses (and ESS licenses) were always sold as part of a system and attached to that system -- one of the things that makes those licenses cheaper than software licenses that live forever and float from system to system. It is also not true that there is a ?diminishing number of OEMs? selling V5. Everybody that sold V4 has added V5 to their contract, as far as I am aware -- except DDN. And we have added a number of additional OEMs in the past couple of years (some of them quite invisibly as Scale is embedded deep in their solution and they want their own brand front and center) and a couple more big names are in development that I can?t mention until they are ready to announce themselves. We also have a more diverse OEM model: as well as storage vendors that include Scale in a storage solution, we have various embedded vertical solutions, backup solutions, and cloud-based service offerings using Scale. Even Dell is selling a Scale solution now via our OEM Arcastream. Again, DDN and IBM are working together to find a path forward for GridScaler owners to get past this problem, and once again I ask for your patience as we get the details right. Regards Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com [signature_50537] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 69558 bytes Desc: image001.png URL: From steve.hindmarsh at crick.ac.uk Fri Apr 17 14:33:10 2020 From: steve.hindmarsh at crick.ac.uk (Steve Hindmarsh) Date: Fri, 17 Apr 2020 13:33:10 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction In-Reply-To: <48C781AA-BF81-4E8B-A290-C55A0C322DD4@us.ibm.com> References: <48C781AA-BF81-4E8B-A290-C55A0C322DD4@us.ibm.com> Message-ID: Hi Carl, Thanks for the update which is very encouraging. I?m happy to sit tight and wait for an announcement. Best, Steve Steve Hindmarsh Head of Scientific Computing The Francis Crick Institute ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Carl Zetie - carlz at us.ibm.com Sent: Friday, April 17, 2020 1:16:38 PM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Spectrum Scale licensing - important correction Rob Horton wrote: >I'm not sure what the issue is between DDN and IBM (although I've heard various rumors) >but I really wish they would sort something out. Yes, it?s a pain. IBM and DDN are trying very hard to work something out, but it?s hard to get all the ?I?s dotted and ?T?s crossed with our respective legal and exec reviewers so that when we do say something it will be complete, clear, and final; and not require long, baroque threads for people to figure out where exactly they are? I wish I could say more, but I need to respect the confidentiality of the relationship and the live discussion. In the meantime, I thank you for your patience, and ask that you not believe any rumors you might hear, because whatever they are, they are wrong (or at least incomplete). In this situation, as a wise man once observed, ?those who Say don?t Know; those who Know don?t Say?. Regards, Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com [signature_749317756] The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 69558 bytes Desc: image001.png URL: From jonathan.buzzard at strath.ac.uk Fri Apr 17 14:44:29 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 17 Apr 2020 14:44:29 +0100 Subject: [gpfsug-discuss] Spectrum Scale licensing In-Reply-To: References: Message-ID: <52374047-db40-9b99-1f29-a5abdab146f3@strath.ac.uk> On 17/04/2020 14:15, Aaron Knister wrote: > Yeah, I had similar experiences in the past (over a decade ago) with > Lustre and was heavily heavily anti-Lustre. That said, I just > finished several weeks of what I?d call grueling testing of DDN > Lustre and GPFS on the same hardware and I?m reasonably convinced > much of that is behind us now (things like stability, metadata > performance, random I/O performance just don?t appear to be issues > anymore and in some cases these operations are now faster in Lustre). Several weeks testing frankly does not cut the mustard to demonstrate stability. Our Lustre would run for months on end then boom, metadata server kernel panics. Sometimes but not always this would introduce the incorrectable file system corruption. You are going to need to have several years behind it to claim it is now stable. At this point I would note that basically a fsck on Lustre is not possible. Sure there is a somewhat complicated procedure for it, but firstly it is highly likely to take weeks to run, and even then it might not be able to actually fix the problem. > Full disclosure, I work for DDN, but the source of my paycheck has > relatively little bearing on my technical opinions. All I?m saying is > for me to honestly believe Lustre is worth another shot after the > experiences I had years ago is significant. I do think it?s key to > have a vendor behind you, vs rolling your own. I have seen that make > a difference. I?m happy to take any further conversation/questions > offline, I?m in no way trying to turn this into a marketing > campaign. Lustre is as of two years ago still behind GPFS 3.0 in terms of features and stability in my view. The idea it has caught up to GPFS 5.x in the last two years is in my view errant nonsense, software development just does not work like that. Let me put it another way, in our experience the loss of compute capacity from the downtime of Lustre exceeded the cost of GPFS licenses. That excludes the wage costs of researches twiddling their thumbs whilst the system was restored to working order. If I am being cynical if you can afford DDN storage in the first place stop winging about GPFS license costs. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From christophe.darras at atempo.com Fri Apr 17 15:00:10 2020 From: christophe.darras at atempo.com (Christophe Darras) Date: Fri, 17 Apr 2020 14:00:10 +0000 Subject: [gpfsug-discuss] Spectrum Scale licensing In-Reply-To: <52374047-db40-9b99-1f29-a5abdab146f3@strath.ac.uk> References: <52374047-db40-9b99-1f29-a5abdab146f3@strath.ac.uk> Message-ID: Hey Ladies and Gent, For some people here, it seems GPFS is like a religion? A lovely weekend to all of you, Kind Regards, Chris -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jonathan Buzzard Sent: vendredi 17 avril 2020 14:44 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Spectrum Scale licensing On 17/04/2020 14:15, Aaron Knister wrote: > Yeah, I had similar experiences in the past (over a decade ago) with > Lustre and was heavily heavily anti-Lustre. That said, I just finished > several weeks of what I?d call grueling testing of DDN Lustre and GPFS > on the same hardware and I?m reasonably convinced much of that is > behind us now (things like stability, metadata performance, random I/O > performance just don?t appear to be issues anymore and in some cases > these operations are now faster in Lustre). Several weeks testing frankly does not cut the mustard to demonstrate stability. Our Lustre would run for months on end then boom, metadata server kernel panics. Sometimes but not always this would introduce the incorrectable file system corruption. You are going to need to have several years behind it to claim it is now stable. At this point I would note that basically a fsck on Lustre is not possible. Sure there is a somewhat complicated procedure for it, but firstly it is highly likely to take weeks to run, and even then it might not be able to actually fix the problem. > Full disclosure, I work for DDN, but the source of my paycheck has > relatively little bearing on my technical opinions. All I?m saying is > for me to honestly believe Lustre is worth another shot after the > experiences I had years ago is significant. I do think it?s key to > have a vendor behind you, vs rolling your own. I have seen that make a > difference. I?m happy to take any further conversation/questions > offline, I?m in no way trying to turn this into a marketing campaign. Lustre is as of two years ago still behind GPFS 3.0 in terms of features and stability in my view. The idea it has caught up to GPFS 5.x in the last two years is in my view errant nonsense, software development just does not work like that. Let me put it another way, in our experience the loss of compute capacity from the downtime of Lustre exceeded the cost of GPFS licenses. That excludes the wage costs of researches twiddling their thumbs whilst the system was restored to working order. If I am being cynical if you can afford DDN storage in the first place stop winging about GPFS license costs. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From yeep at robust.my Fri Apr 17 15:01:05 2020 From: yeep at robust.my (T.A. Yeep) Date: Fri, 17 Apr 2020 22:01:05 +0800 Subject: [gpfsug-discuss] Spectrum Scale licensing - important correction In-Reply-To: References: Message-ID: Hi JAB, Sound interesting, however, I'm actually a newcomer to Scale, I wish I could share the joy of mixing that. I guess maybe it is something similar to LSF RVU/UVUs? Thanks for sharing your experience anyway. Hi Carl, I just want to let you know that I have got your explanation, and I understand it now. Thanks. Not sure If I should always reply a "thank you" or "I've got it" in the mailing list, or better just do it privately. Same I'm new to mailing list too, so please let me know if I should not reply it publicly. On Fri, Apr 17, 2020 at 6:50 PM Jonathan Buzzard < jonathan.buzzard at strath.ac.uk> wrote: > On 17/04/2020 11:31, T.A. Yeep wrote: > > Hi Carl, > > > > I'm confused here, in the previous email it was said *And for ESS, it is > > licensed Per Drive with different prices for HDDs and SSDs.* > > > > But then you mentioned in below email that: > > But new customers and new OEM systems are *all licensed by Capacity. > > This also applies to IBM's own ESS*: you can keep upgrading your old (if > > hardware is supported) gen 1 ESS on Sockets, but if you replace it with > > *a new ESS, that will come with capacity licenses*. > > > > Now the question, ESS is license per Drive or by capacity? > > > > Well by drive is "capacity" based licensing unless you have some sort of > magical infinite capacity drives :-) > > Under the PVU scheme if you know what you are doing you could game the > system. For example get a handful of servers get PVU licenses for them > create a GPFS file system handing off the back using say Fibre Channel > and cheap FC attached arrays (Dell MD3000 series springs to mind) and > then hang many PB off the back. I could using this scheme create a 100PB > filesystem for under a thousand PVU of GPFS server licenses. Add in > another cluster for protocol nodes and if you are not mounting on HPC > nodes that's a winner :-) > > In a similar manner I use a pimped out ancient Dell R300 with dual core > Xeon for backing up my GPFS filesystem because it's 100PVU of TSM > licensing and I am cheap, and besides it is more than enough grunt for > the job. A new machine would be 240 PVU minimum (4*70). I plan on > replacing the PERC SAS6 card with a H710 and new internal cabling to run > RHEL8 :-) > > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Best regards *T.A. Yeep*Mobile: 016-719 8506 | Tel/Fax: 03-6261 7237 | www.robusthpc.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ulmer at ulmer.org Sat Apr 18 16:04:53 2020 From: ulmer at ulmer.org (Stephen Ulmer) Date: Sat, 18 Apr 2020 11:04:53 -0400 Subject: [gpfsug-discuss] gpfs filesets question In-Reply-To: References: Message-ID: Is this still true if the source and target fileset are both in the same storage pool? It seems like they could just move the metadata? Especially in the case of dependent filesets where the metadata is actually in the same allocation area for both the source and target. Maybe this just doesn?t happen often enough to optimize? -- Stephen > On Apr 16, 2020, at 12:50 PM, Oesterlin, Robert wrote: > > Moving data between filesets is like moving files between file systems. Normally when you move files between directories, it?s simple metadata, but with filesets (dependent or independent) is a full copy and delete of the old data. > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > > From: > on behalf of "J. Eric Wonderley" > > Reply-To: gpfsug main discussion list > > Date: Thursday, April 16, 2020 at 11:32 AM > To: gpfsug main discussion list > > Subject: [EXTERNAL] [gpfsug-discuss] gpfs filesets question > > I have filesets setup in a filesystem...looks like: > [root at cl005 ~]# mmlsfileset home -L > Filesets in file system 'home': > Name Id RootInode ParentId Created InodeSpace MaxInodes AllocInodes Comment > root 0 3 -- Tue Jun 30 07:54:09 2015 0 402653184 320946176 root fileset > hess 1 543733376 0 Tue Jun 13 14:56:13 2017 0 0 0 > predictHPC 2 1171116 0 Thu Jan 5 15:16:56 2017 0 0 0 > HYCCSIM 3 544258049 0 Wed Jun 14 10:00:41 2017 0 0 0 > socialdet 4 544258050 0 Wed Jun 14 10:01:02 2017 0 0 0 > arc 5 1171073 0 Thu Jan 5 15:07:09 2017 0 0 0 > arcadm 6 1171074 0 Thu Jan 5 15:07:10 2017 0 0 0 > > I beleive these are dependent filesets. Dependent on the root fileset. Anyhow a user wants to move a large amount of data from one fileset to another. Would this be a metadata only operation? He has attempted to small amount of data and has noticed some thrasing. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From st.graf at fz-juelich.de Mon Apr 20 09:29:17 2020 From: st.graf at fz-juelich.de (Stephan Graf) Date: Mon, 20 Apr 2020 10:29:17 +0200 Subject: [gpfsug-discuss] gpfs filesets question In-Reply-To: References: Message-ID: Hi, we recognized this behavior when we tried to move HSM migrated files between filesets. This cases a recall. Very annoying when the data are afterword stored on the same pools and have to be migrated back to tape. @IBM: should we open a RFE to address this? Stephan Am 18.04.2020 um 17:04 schrieb Stephen Ulmer: > Is this still true if the source and target fileset are both in the same > storage pool? It seems like they could just move the metadata? > Especially in the case of dependent filesets where the metadata is > actually in the same allocation area for both the source and target. > > Maybe this just doesn?t happen often enough to optimize? > > -- > Stephen > > > >> On Apr 16, 2020, at 12:50 PM, Oesterlin, Robert >> > wrote: >> >> Moving data between filesets is like moving files between file >> systems. Normally when you move files between directories, it?s simple >> metadata, but with filesets (dependent or independent) is a full copy >> and delete of the old data. >> Bob Oesterlin >> Sr Principal Storage Engineer, Nuance >> *From:*> > on behalf of "J. >> Eric Wonderley" > >> *Reply-To:*gpfsug main discussion list >> > > >> *Date:*Thursday, April 16, 2020 at 11:32 AM >> *To:*gpfsug main discussion list > > >> *Subject:*[EXTERNAL] [gpfsug-discuss] gpfs filesets question >> I have filesets setup in a filesystem...looks like: >> [root at cl005 ~]# mmlsfileset home -L >> Filesets in file system 'home': >> Name ? ? ? ? ? ? ? ? ? ? ? ? ? ?Id ? ? ?RootInode ?ParentId Created >> ? ? ? ? ? ? ? ? ? ?InodeSpace ? ? ?MaxInodes ? ?AllocInodes Comment >> root ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0 ? ? ? ? ? ? ?3 ? ? ? ?-- Tue Jun 30 >> 07:54:09 2015 ? ? ? ?0 ? ? ? ? ? ?402653184 ? ? ?320946176 root fileset >> hess ? ? ? ? ? ? ? ? ? ? ? ? ? ? 1 ? ? ?543733376 ? ? ? ? 0 Tue Jun 13 >> 14:56:13 2017 ? ? ? ?0 ? ? ? ? ? ? ? ? ? ?0 ? ? ? ? ? ? ?0 >> predictHPC ? ? ? ? ? ? ? ? ? ? ? 2 ? ? ? ?1171116 ? ? ? ? 0 Thu Jan ?5 >> 15:16:56 2017 ? ? ? ?0 ? ? ? ? ? ? ? ? ? ?0 ? ? ? ? ? ? ?0 >> HYCCSIM ? ? ? ? ? ? ? ? ? ? ? ? ?3 ? ? ?544258049 ? ? ? ? 0 Wed Jun 14 >> 10:00:41 2017 ? ? ? ?0 ? ? ? ? ? ? ? ? ? ?0 ? ? ? ? ? ? ?0 >> socialdet ? ? ? ? ? ? ? ? ? ? ? ?4 ? ? ?544258050 ? ? ? ? 0 Wed Jun 14 >> 10:01:02 2017 ? ? ? ?0 ? ? ? ? ? ? ? ? ? ?0 ? ? ? ? ? ? ?0 >> arc ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?5 ? ? ? ?1171073 ? ? ? ? 0 Thu Jan ?5 >> 15:07:09 2017 ? ? ? ?0 ? ? ? ? ? ? ? ? ? ?0 ? ? ? ? ? ? ?0 >> arcadm ? ? ? ? ? ? ? ? ? ? ? ? ? 6 ? ? ? ?1171074 ? ? ? ? 0 Thu Jan ?5 >> 15:07:10 2017 ? ? ? ?0 ? ? ? ? ? ? ? ? ? ?0 ? ? ? ? ? ? ?0 >> I beleive these are dependent filesets.? Dependent on the root >> fileset.? ?Anyhow a user wants to move a large amount of data from one >> fileset to another.? ?Would this be a metadata only operation?? He has >> attempted to small amount of data and has noticed some thrasing. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss atspectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5360 bytes Desc: S/MIME Cryptographic Signature URL: From olaf.weiser at de.ibm.com Mon Apr 20 11:54:06 2020 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Mon, 20 Apr 2020 10:54:06 +0000 Subject: [gpfsug-discuss] gpfs filesets question In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: From skariapaul at gmail.com Wed Apr 22 04:40:28 2020 From: skariapaul at gmail.com (PS K) Date: Wed, 22 Apr 2020 11:40:28 +0800 Subject: [gpfsug-discuss] S3, S3A & S3n support Message-ID: Hi, Does SS object protocol support S3a and S3n? Regards Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From pinto at scinet.utoronto.ca Wed Apr 22 09:19:10 2020 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Wed, 22 Apr 2020 04:19:10 -0400 Subject: [gpfsug-discuss] GPFS vulnerability with possible root exploit on versions prior to 5.0.4.3 (and 4.2.3.21) Message-ID: <4b396469-b1c7-69bc-157c-23bbe62d847a@scinet.utoronto.ca> In case you missed (the forum has been pretty quiet about this one), CVE-2020-4273 had an update yesterday: https://www.ibm.com/support/pages/node/6151701?myns=s033&mynp=OCSTXKQY&mync=E&cm_sp=s033-_-OCSTXKQY-_-E If you can't do the upgrade now, at least apply the mitigation to the client nodes generally exposed to unprivileged users: Check the setuid bit: ls -l /usr/lpp/mmfs/bin | grep r-s | awk '{system("ls -l /usr/lpp/mmfs/bin/"$9)}') Apply the mitigation: ls -l /usr/lpp/mmfs/bin | grep r-s | awk '{system("chmod u-s /usr/lpp/mmfs/bin/"$9)}' Verification: ls -l /usr/lpp/mmfs/bin | grep r-s | awk '{system("ls -l /usr/lpp/mmfs/bin/"$9)}') All the best Jaime . . . ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 From st.graf at fz-juelich.de Wed Apr 22 10:02:59 2020 From: st.graf at fz-juelich.de (Stephan Graf) Date: Wed, 22 Apr 2020 11:02:59 +0200 Subject: [gpfsug-discuss] GPFS vulnerability with possible root exploit on versions prior to 5.0.4.3 (and 4.2.3.21) In-Reply-To: <4b396469-b1c7-69bc-157c-23bbe62d847a@scinet.utoronto.ca> References: <4b396469-b1c7-69bc-157c-23bbe62d847a@scinet.utoronto.ca> Message-ID: Hi I took a lookat the "Readme and Release notes for release 5.0.4.3 IBM Spectrum Scale 5.0.4.3 Spectrum_Scale_Data_Management-5.0.4.3-x86_64-Linux Readme" But I did not find the entry which mentioned the "For IBM Spectrum Scale V5.0.0.0 through V5.0.4.1, reference APAR IJ23438" APAR number which is mentioned on the "Security Bulletin: A vulnerability has been identified in IBM Spectrum Scale where an unprivileged user could execute commands as root ( CVE-2020-4273)" page. shouldn't it be mentioned there? Stephan Am 22.04.2020 um 10:19 schrieb Jaime Pinto: > In case you missed (the forum has been pretty quiet about this one), > CVE-2020-4273 had an update yesterday: > > https://www.ibm.com/support/pages/node/6151701?myns=s033&mynp=OCSTXKQY&mync=E&cm_sp=s033-_-OCSTXKQY-_-E > > > If you can't do the upgrade now, at least apply the mitigation to the > client nodes generally exposed to unprivileged users: > > Check the setuid bit: > ls -l /usr/lpp/mmfs/bin | grep r-s | awk '{system("ls -l > /usr/lpp/mmfs/bin/"$9)}') > > Apply the mitigation: > ls -l /usr/lpp/mmfs/bin | grep r-s | awk '{system("chmod u-s > /usr/lpp/mmfs/bin/"$9)}' > > Verification: > ls -l /usr/lpp/mmfs/bin | grep r-s | awk '{system("ls -l > /usr/lpp/mmfs/bin/"$9)}') > > All the best > Jaime > > . > . > .??????? ************************************ > ????????? TELL US ABOUT YOUR SUCCESS STORIES > ???????? http://www.scinethpc.ca/testimonials > ???????? ************************************ > --- > Jaime Pinto - Storage Analyst > SciNet HPC Consortium - Compute/Calcul Canada > www.scinet.utoronto.ca - www.computecanada.ca > University of Toronto > 661 University Ave. (MaRS), Suite 1140 > Toronto, ON, M5G1M1 > P: 416-978-2755 > C: 416-505-1477 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5360 bytes Desc: S/MIME Cryptographic Signature URL: From knop at us.ibm.com Wed Apr 22 16:42:54 2020 From: knop at us.ibm.com (Felipe Knop) Date: Wed, 22 Apr 2020 15:42:54 +0000 Subject: [gpfsug-discuss] GPFS vulnerability with possible root exploit on versions prior to 5.0.4.3 (and 4.2.3.21) In-Reply-To: References: , <4b396469-b1c7-69bc-157c-23bbe62d847a@scinet.utoronto.ca> Message-ID: An HTML attachment was scrubbed... URL: From thakur.hpc at gmail.com Wed Apr 22 19:23:53 2020 From: thakur.hpc at gmail.com (Bhupender thakur) Date: Wed, 22 Apr 2020 11:23:53 -0700 Subject: [gpfsug-discuss] GPFS vulnerability with possible root exploit on versions prior to 5.0.4.3 (and 4.2.3.21) In-Reply-To: References: <4b396469-b1c7-69bc-157c-23bbe62d847a@scinet.utoronto.ca> Message-ID: Has IBM released or does IBM plan to release a fix in the 5.0.3.x branch? On Wed, Apr 22, 2020 at 8:45 AM Felipe Knop wrote: > Stephan, > > Security bulletins need to go through an internal process, including legal > review. In addition, we are normally required to ensure the fix is > available for all releases before the security bulletin can be published. > Because of that, we normally don't list details for security fixes in > either the readmes or APARs, since the information can only be disclosed in > the bulletin itself. > > ---- > The bulletin below has: > > If you cannot apply the latest level of service, contact IBM Service for > an efix: > > - For IBM Spectrum Scale V5.0.0.0 through V5.0.4.1, reference APAR IJ23438 > > - For IBM Spectrum Scale V4.2.0.0 through V4.2.3.20, reference APAR > IJ23426 > "V5.0.0.0 through V5.0.4.1" should have been "V5.0.0.0 through V5.0.4.2". > (I have asked the text to be corrected) > > > > Felipe > > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > > ----- Original message ----- > From: Stephan Graf > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: > Cc: > Subject: [EXTERNAL] Re: [gpfsug-discuss] GPFS vulnerability with possible > root exploit on versions prior to 5.0.4.3 (and 4.2.3.21) > Date: Wed, Apr 22, 2020 5:04 AM > > Hi > > I took a lookat the "Readme and Release notes for release 5.0.4.3 IBM > Spectrum Scale 5.0.4.3 > Spectrum_Scale_Data_Management-5.0.4.3-x86_64-Linux Readme" > But I did not find the entry which mentioned the "For IBM Spectrum Scale > V5.0.0.0 through V5.0.4.1, reference APAR IJ23438" APAR number which is > mentioned on the "Security Bulletin: A vulnerability has been identified > in IBM Spectrum Scale where an unprivileged user could execute commands > as root ( CVE-2020-4273)" page. > > shouldn't it be mentioned there? > > Stephan > > > Am 22.04.2020 um 10:19 schrieb Jaime Pinto: > > In case you missed (the forum has been pretty quiet about this one), > > CVE-2020-4273 had an update yesterday: > > > > > https://www.ibm.com/support/pages/node/6151701?myns=s033&mynp=OCSTXKQY&mync=E&cm_sp=s033-_-OCSTXKQY-_-E > > > > > > > If you can't do the upgrade now, at least apply the mitigation to the > > client nodes generally exposed to unprivileged users: > > > > Check the setuid bit: > > ls -l /usr/lpp/mmfs/bin | grep r-s | awk '{system("ls -l > > /usr/lpp/mmfs/bin/"$9)}') > > > > Apply the mitigation: > > ls -l /usr/lpp/mmfs/bin | grep r-s | awk '{system("chmod u-s > > /usr/lpp/mmfs/bin/"$9)}' > > > > Verification: > > ls -l /usr/lpp/mmfs/bin | grep r-s | awk '{system("ls -l > > /usr/lpp/mmfs/bin/"$9)}') > > > > All the best > > Jaime > > > > . > > . > > . ************************************ > > TELL US ABOUT YOUR SUCCESS STORIES > > http://www.scinethpc.ca/testimonials > > ************************************ > > --- > > Jaime Pinto - Storage Analyst > > SciNet HPC Consortium - Compute/Calcul Canada > > www.scinet.utoronto.ca - www.computecanada.ca > > University of Toronto > > 661 University Ave. (MaRS), Suite 1140 > > Toronto, ON, M5G1M1 > > P: 416-978-2755 > > C: 416-505-1477 > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Wed Apr 22 21:05:49 2020 From: knop at us.ibm.com (Felipe Knop) Date: Wed, 22 Apr 2020 20:05:49 +0000 Subject: [gpfsug-discuss] GPFS vulnerability with possible root exploit on versions prior to 5.0.4.3 (and 4.2.3.21) In-Reply-To: References: , <4b396469-b1c7-69bc-157c-23bbe62d847a@scinet.utoronto.ca> Message-ID: An HTML attachment was scrubbed... URL: From thakur.hpc at gmail.com Wed Apr 22 21:47:30 2020 From: thakur.hpc at gmail.com (Bhupender thakur) Date: Wed, 22 Apr 2020 13:47:30 -0700 Subject: [gpfsug-discuss] GPFS vulnerability with possible root exploit on versions prior to 5.0.4.3 (and 4.2.3.21) In-Reply-To: References: <4b396469-b1c7-69bc-157c-23bbe62d847a@scinet.utoronto.ca> Message-ID: Thanks for the clarification Felipe. On Wed, Apr 22, 2020 at 1:06 PM Felipe Knop wrote: > Bhupender, > > PTFs for the 5.0.3 branch are no longer produced (as is the case for > 5.0.2, 5.0.1, and 5.0.0), but efixes for 5.0.3 can be requested. When > requesting the efix, please indicate the APAR number listed in bulletin > below, as well as the location of the bulletin itself, just in case: > > > https://www.ibm.com/support/pages/node/6151701?myns=s033&mynp=OCSTXKQY&mync=E&cm_sp=s033-_-OCSTXKQY-_-E > > Felipe > > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > > ----- Original message ----- > From: Bhupender thakur > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: [EXTERNAL] Re: [gpfsug-discuss] GPFS vulnerability with possible > root exploit on versions prior to 5.0.4.3 (and 4.2.3.21) > Date: Wed, Apr 22, 2020 2:24 PM > > Has IBM released or does IBM plan to release a fix in the 5.0.3.x branch? > > On Wed, Apr 22, 2020 at 8:45 AM Felipe Knop wrote: > > Stephan, > > Security bulletins need to go through an internal process, including legal > review. In addition, we are normally required to ensure the fix is > available for all releases before the security bulletin can be published. > Because of that, we normally don't list details for security fixes in > either the readmes or APARs, since the information can only be disclosed in > the bulletin itself. > > ---- > The bulletin below has: > > If you cannot apply the latest level of service, contact IBM Service for > an efix: > > - For IBM Spectrum Scale V5.0.0.0 through V5.0.4.1, reference APAR IJ23438 > > - For IBM Spectrum Scale V4.2.0.0 through V4.2.3.20, reference APAR > IJ23426 > "V5.0.0.0 through V5.0.4.1" should have been "V5.0.0.0 through V5.0.4.2". > (I have asked the text to be corrected) > > > > Felipe > > ---- > Felipe Knop knop at us.ibm.com > GPFS Development and Security > IBM Systems > IBM Building 008 > 2455 South Rd, Poughkeepsie, NY 12601 > (845) 433-9314 T/L 293-9314 > > > > > ----- Original message ----- > From: Stephan Graf > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: > Cc: > Subject: [EXTERNAL] Re: [gpfsug-discuss] GPFS vulnerability with possible > root exploit on versions prior to 5.0.4.3 (and 4.2.3.21) > Date: Wed, Apr 22, 2020 5:04 AM > > Hi > > I took a lookat the "Readme and Release notes for release 5.0.4.3 IBM > Spectrum Scale 5.0.4.3 > Spectrum_Scale_Data_Management-5.0.4.3-x86_64-Linux Readme" > But I did not find the entry which mentioned the "For IBM Spectrum Scale > V5.0.0.0 through V5.0.4.1, reference APAR IJ23438" APAR number which is > mentioned on the "Security Bulletin: A vulnerability has been identified > in IBM Spectrum Scale where an unprivileged user could execute commands > as root ( CVE-2020-4273)" page. > > shouldn't it be mentioned there? > > Stephan > > > Am 22.04.2020 um 10:19 schrieb Jaime Pinto: > > In case you missed (the forum has been pretty quiet about this one), > > CVE-2020-4273 had an update yesterday: > > > > > https://www.ibm.com/support/pages/node/6151701?myns=s033&mynp=OCSTXKQY&mync=E&cm_sp=s033-_-OCSTXKQY-_-E > > > > > > > If you can't do the upgrade now, at least apply the mitigation to the > > client nodes generally exposed to unprivileged users: > > > > Check the setuid bit: > > ls -l /usr/lpp/mmfs/bin | grep r-s | awk '{system("ls -l > > /usr/lpp/mmfs/bin/"$9)}') > > > > Apply the mitigation: > > ls -l /usr/lpp/mmfs/bin | grep r-s | awk '{system("chmod u-s > > /usr/lpp/mmfs/bin/"$9)}' > > > > Verification: > > ls -l /usr/lpp/mmfs/bin | grep r-s | awk '{system("ls -l > > /usr/lpp/mmfs/bin/"$9)}') > > > > All the best > > Jaime > > > > . > > . > > . ************************************ > > TELL US ABOUT YOUR SUCCESS STORIES > > http://www.scinethpc.ca/testimonials > > ************************************ > > --- > > Jaime Pinto - Storage Analyst > > SciNet HPC Consortium - Compute/Calcul Canada > > www.scinet.utoronto.ca - www.computecanada.ca > > University of Toronto > > 661 University Ave. (MaRS), Suite 1140 > > Toronto, ON, M5G1M1 > > P: 416-978-2755 > > C: 416-505-1477 > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Wed Apr 22 23:34:33 2020 From: bbanister at jumptrading.com (Bryan Banister) Date: Wed, 22 Apr 2020 22:34:33 +0000 Subject: [gpfsug-discuss] Is there a difference in suspend and empty NSD state? Message-ID: Hello all, Looking at the man page, it is fairly ambiguous as to these NSD states actually being different (and if not WHY have to names for the same thing?!): suspend or empty Instructs GPFS to stop allocating space on the specified disk. Put a disk in this state when you are preparing to remove the file system data from the disk or if you want to prevent new data from being put on the disk. This is a user-initiated state that GPFS never enters without an explicit command to change the disk state. Existing data on a suspended disk may still be read or updated. A disk remains in a suspended or to be emptied state until it is explicitly resumed. Restarting GPFS or rebooting nodes does not restore normal access to a suspended disk. And from the examples lower in the page: Note: In product versions earlier than V4.1.1, the mmlsdisk command lists the disk status as suspended. In product versions V4.1.1 and later, the mmlsdisk command lists the disk status as to be emptied with both mmchdisk suspend or mmchdisk empty commands. And really what I currently want to do is suspend a set of disks, and then mark a different set of disks as "to be emptied". Then I will run a mmrestripefs operation to move the data off of the "to be emptied" disks, but not onto the suspended disks (which will also be removed from the file system in the near future). Once the NSDs are emptied then it will be a very (relatively) fast mmdeldisk operation. So is that possible? As you can likely tell, I don't have enough space to just delete both sets of disks at once during a (yay!) full file system migration to the new GPFS 5.x version. Thought this might be useful to others, so posted here. Thanks in advance neighbors! -Bryan -------------- next part -------------- An HTML attachment was scrubbed... URL: From brnelson at us.ibm.com Thu Apr 23 00:49:13 2020 From: brnelson at us.ibm.com (Brian Nelson) Date: Wed, 22 Apr 2020 18:49:13 -0500 Subject: [gpfsug-discuss] S3, S3A & S3n support In-Reply-To: References: Message-ID: The Spectrum Scale Object protocol only has support for the traditional S3 object storage. -Brian =================================== Brian Nelson IBM Spectrum Scale brnelson at us.ibm.com ----- Original message ----- From: PS K Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Cc: Subject: [EXTERNAL] [gpfsug-discuss] S3, S3A & S3n support Date: Wed, Apr 22, 2020 12:03 AM Hi, Does SS object protocol support S3a and S3n? Regards Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Thu Apr 23 11:33:34 2020 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Thu, 23 Apr 2020 18:33:34 +0800 Subject: [gpfsug-discuss] =?utf-8?q?Is_there_a_difference_in_suspend_and_e?= =?utf-8?q?mpty_NSD=09state=3F?= In-Reply-To: References: Message-ID: Option 'suspend' is same to 'empty' if the cluster is updated to Scale 4.1.1. The option 'empty' was introduced in 4.1.1 to support disk deletion in a fast way, 'suspend' option was not removed with due consideration for previous users. > And really what I currently want to do is suspend a set of disks, > and then mark a different set of disks as ?to be emptied?. Then I > will run a mmrestripefs operation to move the data off of the ?to be > emptied? disks, but not onto the suspended disks (which will also be > removed from the file system in the near future). Once the NSDs are > emptied then it will be a very (relatively) fast mmdeldisk > operation. So is that possible? It's possible only if these two sets of disks belong to two different pools . If they are in the same pool, restripefs on the pool will migrate all data off these two sets of disks. If they are in two different pools, you can use mmrestripefs with -P option to migrate data off "suspended" and "to be emptied" disks in the specified data pool. Please note that system pool is special, mmrestripefs will unconditionally restripe the system pool even you specified -P option to a data pool. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. gpfsug-discuss-bounces at spectrumscale.org wrote on 2020/04/23 06:34:33: > From: Bryan Banister > To: gpfsug main discussion list > Date: 2020/04/23 06:35 > Subject: [EXTERNAL] [gpfsug-discuss] Is there a difference in > suspend and empty NSD state? > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > Hello all, > > Looking at the man page, it is fairly ambiguous as to these NSD > states actually being different (and if not WHY have to names for > the same thing?!): > > suspend > or > empty > Instructs GPFS to stop allocating space on the specified > disk. Put a disk in this state when you are preparing to > remove the file system data from the disk or if you want > to prevent new data from being put on the disk. This is > a user-initiated state that GPFS never enters without an > explicit command to change the disk state. Existing data > on a suspended disk may still be read or updated. > > A disk remains in a suspended or to be > emptied state until it is explicitly resumed. > Restarting GPFS or rebooting nodes does not restore > normal access to a suspended disk. > > And from the examples lower in the page: > Note: In product versions earlier than V4.1.1, the > mmlsdisk command lists the disk status as > suspended. In product versions V4.1.1 and later, the > mmlsdisk command lists the disk status as to be > emptied with both mmchdisk suspend or mmchdisk > empty commands. > > > And really what I currently want to do is suspend a set of disks, > and then mark a different set of disks as ?to be emptied?. Then I > will run a mmrestripefs operation to move the data off of the ?to be > emptied? disks, but not onto the suspended disks (which will also be > removed from the file system in the near future). Once the NSDs are > emptied then it will be a very (relatively) fast mmdeldisk > operation. So is that possible? > > As you can likely tell, I don?t have enough space to just delete > both sets of disks at once during a (yay!) full file system > migration to the new GPFS 5.x version. > > Thought this might be useful to others, so posted here. Thanks in > advance neighbors! > -Bryan_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url? > u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=QxEYrybXOI6xpUEVxZumWQYDMDbDLx4O4vrm0PNotMw&s=4M2- > uNMOrvL7kEQu_UmL5VvnkKfPL-EpSapVGkSX1jc&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From carlz at us.ibm.com Thu Apr 23 13:55:43 2020 From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com) Date: Thu, 23 Apr 2020 12:55:43 +0000 Subject: [gpfsug-discuss] S3, S3A & S3n support Message-ID: <4BD10FED-F735-4E9D-A04A-7D5C1AD7C598@us.ibm.com> From PS K: >Does SS object protocol support S3a and S3n? Can you share some more details of your requirements, use case, etc., either here on the list or privately with me? We?re currently looking at the strategic direction of our S3 support. As Brian said, today it?s strictly the ?traditional? S3 protocol, but we are evaluating where to go next. Thanks, Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com [signature_219535040] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 69558 bytes Desc: image001.png URL: From skariapaul at gmail.com Fri Apr 24 09:24:53 2020 From: skariapaul at gmail.com (PS K) Date: Fri, 24 Apr 2020 16:24:53 +0800 Subject: [gpfsug-discuss] S3, S3A & S3n support In-Reply-To: <4BD10FED-F735-4E9D-A04A-7D5C1AD7C598@us.ibm.com> References: <4BD10FED-F735-4E9D-A04A-7D5C1AD7C598@us.ibm.com> Message-ID: This is for spark integration which supports only s3a. Cheers On Thu, Apr 23, 2020 at 8:55 PM Carl Zetie - carlz at us.ibm.com < carlz at us.ibm.com> wrote: > From PS K: > > >Does SS object protocol support S3a and S3n? > > > > Can you share some more details of your requirements, use case, etc., > either here on the list or privately with me? > > > > We?re currently looking at the strategic direction of our S3 support. As > Brian said, today it?s strictly the ?traditional? S3 protocol, but we are > evaluating where to go next. > > > > Thanks, > > > > Carl Zetie > > Program Director > > Offering Management > > Spectrum Scale > > ---- > > (919) 473 3318 ][ Research Triangle Park > > carlz at us.ibm.com > > [image: signature_219535040] > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 69558 bytes Desc: not available URL: From TROPPENS at de.ibm.com Mon Apr 27 10:28:59 2020 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Mon, 27 Apr 2020 09:28:59 +0000 Subject: [gpfsug-discuss] Chart decks of German User Meeting are now available Message-ID: An HTML attachment was scrubbed... URL: From andi at christiansen.xxx Tue Apr 28 07:34:37 2020 From: andi at christiansen.xxx (Andi Christiansen) Date: Tue, 28 Apr 2020 08:34:37 +0200 (CEST) Subject: [gpfsug-discuss] Tuning Spectrum Scale AFM for stability? Message-ID: <239358449.52194.1588055677577@privateemail.com> Hi All, Can anyone share some thoughts on how to tune AFM for stability? at the moment we have ok performance between our sites (5-8Gbits with 34ms latency) but we encounter a lock down of the cache fileset from week to week, which was day to day before we tuned below settings.. is there any way to tune AFM further i haven't found ? Cache Site only: TCP Settings: sunrpc.tcp_slot_table_entries = 128 Home and Cache: AFM / GPFS Settings: maxBufferDescs=163840 afmHardMemThreshold=25G afmMaxWriteMergeLen=30G Cache fileset: Attributes for fileset AFMFILESET: ================================ Status Linked Path /mnt/fs02/AFMFILESET Id 1 Root inode 524291 Parent Id 0 Created Tue Apr 14 15:57:43 2020 Comment Inode space 1 Maximum number of inodes 10000384 Allocated inodes 10000384 Permission change flag chmodAndSetacl afm-associated Yes Target nfs://DK_VPN/mnt/fs01/AFMFILESET Mode single-writer File Lookup Refresh Interval 30 (default) File Open Refresh Interval 30 (default) Dir Lookup Refresh Interval 60 (default) Dir Open Refresh Interval 60 (default) Async Delay 15 (default) Last pSnapId 0 Display Home Snapshots no Number of Read Threads per Gateway 64 Parallel Read Chunk Size 128 Parallel Read Threshold 1024 Number of Gateway Flush Threads 48 Prefetch Threshold 0 (default) Eviction Enabled yes (default) Parallel Write Threshold 1024 Parallel Write Chunk Size 128 Number of Write Threads per Gateway 16 IO Flags 0 (default) mmfsadm dump afm: AFM Gateway: RpcQLen: 0 maxPoolSize: 4294967295 QOF: 0 MaxOF: 131072 readThLimit 128 minIOBuf 1048576 maxIOBuf 1073741824 msgMaxWriteSize 2147483648 readBypassThresh 67108864 QLen: 0 QMem: 0 SoftQMem: 10737418240 HardQMem 26843545600 Ping thread: Started Fileset: AFMFILESET 1 (fs02) mode: single-writer queue: Normal MDS: QMem 0 CTL 577 home: DK_VPN homeServer: 10.110.5.11 proto: nfs port: 2049 lastCmd: 16 handler: Mounted Dirty refCount: 1 queueTransfer: state: Idle senderVerified: 0 receiverVerified: 1 terminate: 0 psnapWait: 0 remoteAttrs: AsyncLookups 0 tsfindinode: success 0 failed 0 totalTime 0.0 avgTime 0,000000 maxTime 0.0 queue: delay 15 QLen 0+0 flushThds 0 maxFlushThds 48 numExec 8772518 qfs 0 iwo 0 err 78 handlerCreateTime : 2020-04-27_11:14:57.415+0200 numCreateSnaps : 0 InflightAsyncLookups 0 lastReplayTime : 2020-04-28_07:22:32.415+0200 lastSyncTime : 2020-04-27_15:09:57.415+0200 i/o: readBuf: 33554432 writeBuf: 2097152 sparseReadThresh: 134217728 pReadThreads 64 i/o: pReadChunkSize 33554432 pReadThresh: 1073741824 pWriteThresh: 1073741824 i/o: prefetchThresh 0 (Prefetch) Mnt status: 0:0 1:0 2:0 3:0 Export Map: 10.110.5.10/ 10.110.5.11/ 10.110.5.12/ 10.110.5.13/ Priority Queue: Empty (state: Active) Normal Queue: Empty (state: Active) Cluster Config Cache: maxFilesToCache 131072 maxStatCache 524288 afmDIO 2 afmIOFlags 4096 maxReceiverThreads 32 afmNumReadThreads 64 afmNumWriteThreads 8 afmHardMemThreshold 26843545600 maxBufferDescs 163840 afmMaxWriteMergeLen 32212254720 workerThreads 1024 The entries in the gpfs log states "AFM: Home is taking longer to respond..." but its only AFM and the Cache AFM fileset which enteres a locked state. we have the same NFS exports from home mounted on the same gateway nodes to check when a file is transferred and they are all ok while the AFM lock is happening. a simple gpfs restart of the AFM Master node is enough to make AFM restart and continue for another week.. The home target is exported through CES NFS from 4 CES nodes and a map is created at the Cache site to utilize the ParallelWrites feature. If there is anyone sitting around with some ideas/knowledge on how to tune this further for more stability then i would be happy if you could share your thoughts about it! :-) Many Thanks in Advance! Andi Christiansen -------------- next part -------------- An HTML attachment was scrubbed... URL: From u.sibiller at science-computing.de Tue Apr 28 11:57:48 2020 From: u.sibiller at science-computing.de (Ulrich Sibiller) Date: Tue, 28 Apr 2020 12:57:48 +0200 Subject: [gpfsug-discuss] wait for mount during gpfs startup Message-ID: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de> Hi, when the gpfs systemd service returns from startup the filesystems are usually not mounted. So having another service depending on gpfs is not feasible if you require the filesystem(s). Therefore we have added a script to the systemd gpfs service that waits for all local gpfs filesystems being mounted. We have added that script via ExecStartPost: ------------------------------------------------------------ # cat /etc/systemd/system/gpfs.service.d/waitmount.conf [Service] ExecStartPost=/usr/local/sc-gpfs/sbin/wait-for-all_local-mounts.sh TimeoutStartSec=200 ------------------------------------------------------------- The script itself is not doing much: ------------------------------------------------------------- #!/bin/bash # # wait until all _local_ gpfs filesystems are mounted. It ignored # filesystems where mmlsfs -A does not report "yes". # # returns 0 if all fs are mounted (or none are found in gpfs configuration) # returns non-0 otherwise # wait for max. TIMEOUT seconds TIMEOUT=180 # leading space is required! FS=" $(/usr/lpp/mmfs/bin/mmlsfs all_local -Y 2>/dev/null | grep :automaticMountOption:yes: | cut -d: -f7 | xargs; exit ${PIPESTATUS[0]})" # RC=1 and no output means there are no such filesystems configured in GPFS [ $? -eq 1 ] && [ "$FS" = " " ] && exit 0 # uncomment this line for testing #FS="$FS gpfsdummy" while [ $TIMEOUT -gt 0 ]; do for fs in ${FS}; do if findmnt $fs -n &>/dev/null; then FS=${FS/ $fs/} continue 2; fi done [ -z "${FS// /}" ] && break (( TIMEOUT -= 5 )) sleep 5 done if [ -z "${FS// /}" ]; then exit 0 else echo >&2 "ERROR: filesystem(s) not found in time:${FS}" exit 2 fi -------------------------------------------------- This works without problems on _most_ of our clusters. However, not on all. Some of them show what I believe is a race condition and fail to startup after a reboot: ---------------------------------------------------------------------- # journalctl -u gpfs -- Logs begin at Fri 2020-04-24 17:11:26 CEST, end at Tue 2020-04-28 12:47:34 CEST. -- Apr 24 17:12:13 myhost systemd[1]: Starting General Parallel File System... Apr 24 17:12:17 myhost mmfs[5720]: [X] Cannot open configuration file /var/mmfs/gen/mmfs.cfg. Apr 24 17:13:44 myhost systemd[1]: gpfs.service start-post operation timed out. Stopping. Apr 24 17:13:44 myhost mmremote[8966]: Shutting down! Apr 24 17:13:48 myhost mmremote[8966]: Unloading modules from /lib/modules/3.10.0-1062.18.1.el7.x86_64/extra Apr 24 17:13:48 myhost mmremote[8966]: Unloading module mmfs26 Apr 24 17:13:48 myhost mmremote[8966]: Unloading module mmfslinux Apr 24 17:13:48 myhost systemd[1]: Failed to start General Parallel File System. Apr 24 17:13:48 myhost systemd[1]: Unit gpfs.service entered failed state. Apr 24 17:13:48 myhost systemd[1]: gpfs.service failed. ---------------------------------------------------------------------- The mmfs.log shows a bit more: ---------------------------------------------------------------------- # less /var/adm/ras/mmfs.log.previous 2020-04-24_17:12:14.609+0200: runmmfs starting (4254) 2020-04-24_17:12:14.622+0200: [I] Removing old /var/adm/ras/mmfs.log.* files: 2020-04-24_17:12:14.658+0200: runmmfs: [I] Unloading modules from /lib/modules/3.10.0-1062.18.1.el7.x86_64/extra 2020-04-24_17:12:14.692+0200: runmmfs: [I] Unloading module mmfs26 2020-04-24_17:12:14.901+0200: runmmfs: [I] Unloading module mmfslinux 2020-04-24_17:12:15.018+0200: runmmfs: [I] Unloading module tracedev 2020-04-24_17:12:15.057+0200: runmmfs: [I] Loading modules from /lib/modules/3.10.0-1062.18.1.el7.x86_64/extra Module Size Used by mmfs26 2657452 0 mmfslinux 809734 1 mmfs26 tracedev 48618 2 mmfs26,mmfslinux 2020-04-24_17:12:16.720+0200: Node rebooted. Starting mmautoload... 2020-04-24_17:12:17.011+0200: [I] This node has a valid standard license 2020-04-24_17:12:17.011+0200: [I] Initializing the fast condition variables at 0x5561DFC365C0 ... 2020-04-24_17:12:17.011+0200: [I] mmfsd initializing. {Version: 5.0.4.2 Built: Jan 27 2020 12:13:06} ... 2020-04-24_17:12:17.011+0200: [I] Cleaning old shared memory ... 2020-04-24_17:12:17.012+0200: [I] First pass parsing mmfs.cfg ... 2020-04-24_17:12:17.013+0200: [X] Cannot open configuration file /var/mmfs/gen/mmfs.cfg. 2020-04-24_17:12:20.667+0200: mmautoload: Starting GPFS ... 2020-04-24_17:13:44.846+0200: mmremote: Initiating GPFS shutdown ... 2020-04-24_17:13:47.861+0200: mmremote: Starting the mmsdrserv daemon ... 2020-04-24_17:13:47.955+0200: mmremote: Unloading GPFS kernel modules ... 2020-04-24_17:13:48.165+0200: mmremote: Completing GPFS shutdown ... -------------------------------------------------------------------------- Starting the gpfs service again manually then works without problems. Interestingly the missing mmfs.cfg _is there_ after the shutdown, it gets created shortly after the failure. That's why I am assuming a race condition: -------------------------------------------------------------------------- # stat /var/mmfs/gen/mmfs.cfg File: ?/var/mmfs/gen/mmfs.cfg? Size: 408 Blocks: 8 IO Block: 4096 regular file Device: fd00h/64768d Inode: 268998265 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Context: system_u:object_r:var_t:s0 Access: 2020-04-27 17:12:19.801060073 +0200 Modify: 2020-04-24 17:12:17.617823441 +0200 Change: 2020-04-24 17:12:17.659823405 +0200 Birth: - -------------------------------------------------------------------------- Now, the interesting part: - removing the ExecStartPost script makes the issue vanish. Reboot is always startign gpfs successfully - reducing the ExecStartPost to simply one line ("exit 0") makes the issue stay. gpfs startup always fails. Unfortunately IBM is refusing support because "the script is not coming with gpfs". So I am searching for a solution that makes the script work on those servers again. Or a better way to wait for all local gpfs mounts being ready. Has anyone written something like that already? Thank you, Uli -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 From stockf at us.ibm.com Tue Apr 28 12:30:38 2020 From: stockf at us.ibm.com (Frederick Stock) Date: Tue, 28 Apr 2020 11:30:38 +0000 Subject: [gpfsug-discuss] wait for mount during gpfs startup In-Reply-To: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de> References: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de> Message-ID: An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Tue Apr 28 12:30:38 2020 From: stockf at us.ibm.com (Frederick Stock) Date: Tue, 28 Apr 2020 11:30:38 +0000 Subject: [gpfsug-discuss] wait for mount during gpfs startup In-Reply-To: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de> References: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de> Message-ID: An HTML attachment was scrubbed... URL: From vpuvvada at in.ibm.com Tue Apr 28 12:37:24 2020 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Tue, 28 Apr 2020 17:07:24 +0530 Subject: [gpfsug-discuss] Tuning Spectrum Scale AFM for stability? In-Reply-To: <239358449.52194.1588055677577@privateemail.com> References: <239358449.52194.1588055677577@privateemail.com> Message-ID: Hi, What is lock down of AFM fileset ? Are the messages in requeued state and AFM won't replicate any data ? I would recommend opening a ticket by collecting the logs and internaldump from the gateway node when the replication is stuck. You can also try increasing the value of afmAsyncOpWaitTimeout option and see if this solves the issue. mmchconfig afmAsyncOpWaitTimeout=3600 -i ~Venkat (vpuvvada at in.ibm.com) From: Andi Christiansen To: "gpfsug-discuss at spectrumscale.org" Date: 04/28/2020 12:04 PM Subject: [EXTERNAL] [gpfsug-discuss] Tuning Spectrum Scale AFM for stability? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, Can anyone share some thoughts on how to tune AFM for stability? at the moment we have ok performance between our sites (5-8Gbits with 34ms latency) but we encounter a lock down of the cache fileset from week to week, which was day to day before we tuned below settings.. is there any way to tune AFM further i haven't found ? Cache Site only: TCP Settings: sunrpc.tcp_slot_table_entries = 128 Home and Cache: AFM / GPFS Settings: maxBufferDescs=163840 afmHardMemThreshold=25G afmMaxWriteMergeLen=30G Cache fileset: Attributes for fileset AFMFILESET: ================================ Status Linked Path /mnt/fs02/AFMFILESET Id 1 Root inode 524291 Parent Id 0 Created Tue Apr 14 15:57:43 2020 Comment Inode space 1 Maximum number of inodes 10000384 Allocated inodes 10000384 Permission change flag chmodAndSetacl afm-associated Yes Target nfs://DK_VPN/mnt/fs01/AFMFILESET Mode single-writer File Lookup Refresh Interval 30 (default) File Open Refresh Interval 30 (default) Dir Lookup Refresh Interval 60 (default) Dir Open Refresh Interval 60 (default) Async Delay 15 (default) Last pSnapId 0 Display Home Snapshots no Number of Read Threads per Gateway 64 Parallel Read Chunk Size 128 Parallel Read Threshold 1024 Number of Gateway Flush Threads 48 Prefetch Threshold 0 (default) Eviction Enabled yes (default) Parallel Write Threshold 1024 Parallel Write Chunk Size 128 Number of Write Threads per Gateway 16 IO Flags 0 (default) mmfsadm dump afm: AFM Gateway: RpcQLen: 0 maxPoolSize: 4294967295 QOF: 0 MaxOF: 131072 readThLimit 128 minIOBuf 1048576 maxIOBuf 1073741824 msgMaxWriteSize 2147483648 readBypassThresh 67108864 QLen: 0 QMem: 0 SoftQMem: 10737418240 HardQMem 26843545600 Ping thread: Started Fileset: AFMFILESET 1 (fs02) mode: single-writer queue: Normal MDS: QMem 0 CTL 577 home: DK_VPN homeServer: 10.110.5.11 proto: nfs port: 2049 lastCmd: 16 handler: Mounted Dirty refCount: 1 queueTransfer: state: Idle senderVerified: 0 receiverVerified: 1 terminate: 0 psnapWait: 0 remoteAttrs: AsyncLookups 0 tsfindinode: success 0 failed 0 totalTime 0.0 avgTime 0,000000 maxTime 0.0 queue: delay 15 QLen 0+0 flushThds 0 maxFlushThds 48 numExec 8772518 qfs 0 iwo 0 err 78 handlerCreateTime : 2020-04-27_11:14:57.415+0200 numCreateSnaps : 0 InflightAsyncLookups 0 lastReplayTime : 2020-04-28_07:22:32.415+0200 lastSyncTime : 2020-04-27_15:09:57.415+0200 i/o: readBuf: 33554432 writeBuf: 2097152 sparseReadThresh: 134217728 pReadThreads 64 i/o: pReadChunkSize 33554432 pReadThresh: 1073741824 pWriteThresh: 1073741824 i/o: prefetchThresh 0 (Prefetch) Mnt status: 0:0 1:0 2:0 3:0 Export Map: 10.110.5.10/ 10.110.5.11/ 10.110.5.12/ 10.110.5.13/ Priority Queue: Empty (state: Active) Normal Queue: Empty (state: Active) Cluster Config Cache: maxFilesToCache 131072 maxStatCache 524288 afmDIO 2 afmIOFlags 4096 maxReceiverThreads 32 afmNumReadThreads 64 afmNumWriteThreads 8 afmHardMemThreshold 26843545600 maxBufferDescs 163840 afmMaxWriteMergeLen 32212254720 workerThreads 1024 The entries in the gpfs log states "AFM: Home is taking longer to respond..." but its only AFM and the Cache AFM fileset which enteres a locked state. we have the same NFS exports from home mounted on the same gateway nodes to check when a file is transferred and they are all ok while the AFM lock is happening. a simple gpfs restart of the AFM Master node is enough to make AFM restart and continue for another week.. The home target is exported through CES NFS from 4 CES nodes and a map is created at the Cache site to utilize the ParallelWrites feature. If there is anyone sitting around with some ideas/knowledge on how to tune this further for more stability then i would be happy if you could share your thoughts about it! :-) Many Thanks in Advance! Andi Christiansen _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=-XbtU1ILcqI_bUurDD3j1j-oqGszcNZAbQVIhQ5EZOs&s=IjrGy-VdY1cuNfy0bViEykWMEVDax7_xvrMdRhQ2QkM&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Tue Apr 28 12:38:01 2020 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 28 Apr 2020 12:38:01 +0100 Subject: [gpfsug-discuss] wait for mount during gpfs startup In-Reply-To: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de> References: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de> Message-ID: <277171dd-9297-98e4-1e18-352fb49ef96f@strath.ac.uk> On 28/04/2020 11:57, Ulrich Sibiller wrote: > Hi, > > when the gpfs systemd service returns from startup the filesystems are > usually not mounted. So having another service depending on gpfs is not > feasible if you require the filesystem(s). > > Therefore we have added a script to the systemd gpfs service that waits > for all local gpfs filesystems being mounted. We have added that script > via ExecStartPost: > Yuck, and double yuck. There are many things you can say about systemd (and I have a choice few) but one of them is that it makes this sort of hackery obsolete. At least that is one of it goals. A systemd way to do it would be via one or more helper units. So lets assume your GPFS file system is mounted on /gpfs, then create a file called ismounted.txt on it and then create a unit called say gpfs_mounted.target that looks like # gpfs_mounted.target [Unit] TimeoutStartSec=infinity ConditionPathExists=/gpfs/ismounted.txt ExecStart=/usr/bin/sleep 10 RemainAfterExit=yes Then the main unit gets Wants=gpfs_mounted.target After=gpfs_mounted.target If you are using scripts in systemd you are almost certainly doing it wrong :-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From juergen.hannappel at desy.de Tue Apr 28 12:55:50 2020 From: juergen.hannappel at desy.de (Hannappel, Juergen) Date: Tue, 28 Apr 2020 13:55:50 +0200 (CEST) Subject: [gpfsug-discuss] wait for mount during gpfs startup In-Reply-To: <277171dd-9297-98e4-1e18-352fb49ef96f@strath.ac.uk> References: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de> <277171dd-9297-98e4-1e18-352fb49ef96f@strath.ac.uk> Message-ID: <792994589.3526899.1588074950106.JavaMail.zimbra@desy.de> Hi, a gpfs.mount target should be automatically created at boot by the systemd-fstab-generator from the fstab entry, so no need with hackery like ismountet.txt... ----- Original Message ----- > From: "Jonathan Buzzard" > To: gpfsug-discuss at spectrumscale.org > Sent: Tuesday, 28 April, 2020 13:38:01 > Subject: Re: [gpfsug-discuss] wait for mount during gpfs startup > Yuck, and double yuck. There are many things you can say about systemd > (and I have a choice few) but one of them is that it makes this sort of > hackery obsolete. At least that is one of it goals. > > A systemd way to do it would be via one or more helper units. So lets > assume your GPFS file system is mounted on /gpfs, then create a file > called ismounted.txt on it and then create a unit called say > gpfs_mounted.target that looks like > > > # gpfs_mounted.target > [Unit] > TimeoutStartSec=infinity > ConditionPathExists=/gpfs/ismounted.txt > ExecStart=/usr/bin/sleep 10 > RemainAfterExit=yes > > Then the main unit gets > > Wants=gpfs_mounted.target > After=gpfs_mounted.target > > If you are using scripts in systemd you are almost certainly doing it > wrong :-) > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From carlz at us.ibm.com Tue Apr 28 13:10:56 2020 From: carlz at us.ibm.com (Carl Zetie - carlz@us.ibm.com) Date: Tue, 28 Apr 2020 12:10:56 +0000 Subject: [gpfsug-discuss] wait for mount during gpfs startup (Ulrich Sibiller) Message-ID: There?s an RFE related to this: RFE 125955 (https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=125955) I recommend that people add their votes and comments there as well as discussing it here in the UG. Carl Zetie Program Director Offering Management Spectrum Scale ---- (919) 473 3318 ][ Research Triangle Park carlz at us.ibm.com [signature_1027147421] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 69558 bytes Desc: image001.png URL: From andi at christiansen.xxx Tue Apr 28 13:25:37 2020 From: andi at christiansen.xxx (Andi Christiansen) Date: Tue, 28 Apr 2020 14:25:37 +0200 (CEST) Subject: [gpfsug-discuss] Tuning Spectrum Scale AFM for stability? In-Reply-To: References: <239358449.52194.1588055677577@privateemail.com> Message-ID: <467674858.57941.1588076737138@privateemail.com> Hi Venkat, The AFM fileset becomes totally unresponsive from all nodes within the cluster and the only way to resolve it is to do a "mmshutdown" and wait 2 mins, then "mmshutdown" again as it cannot really do it the first time.. and then a "mmstartup" then all is back to normal and AFM is stopped and can be started again for another week or so.. mmafmctl stop -j will just hang endless.. i will try to set that value and see if that does anything for us :) Thanks! Best Regards Andi Christiansen > On April 28, 2020 1:37 PM Venkateswara R Puvvada wrote: > > > Hi, > > What is lock down of AFM fileset ? Are the messages in requeued state and AFM won't replicate any data ? I would recommend opening a ticket by collecting the logs and internaldump from the gateway node when the replication is stuck. > > You can also try increasing the value of afmAsyncOpWaitTimeout option and see if this solves the issue. > > mmchconfig afmAsyncOpWaitTimeout=3600 -i > > ~Venkat (vpuvvada at in.ibm.com) > > > > From: Andi Christiansen > To: "gpfsug-discuss at spectrumscale.org" > Date: 04/28/2020 12:04 PM > Subject: [EXTERNAL] [gpfsug-discuss] Tuning Spectrum Scale AFM for stability? > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > --------------------------------------------- > > > > Hi All, > > Can anyone share some thoughts on how to tune AFM for stability? at the moment we have ok performance between our sites (5-8Gbits with 34ms latency) but we encounter a lock down of the cache fileset from week to week, which was day to day before we tuned below settings.. is there any way to tune AFM further i haven't found ? > > > Cache Site only: > TCP Settings: > sunrpc.tcp_slot_table_entries = 128 > > > Home and Cache: > AFM / GPFS Settings: > maxBufferDescs=163840 > afmHardMemThreshold=25G > afmMaxWriteMergeLen=30G > > > Cache fileset: > Attributes for fileset AFMFILESET: > ================================ > Status Linked > Path /mnt/fs02/AFMFILESET > Id 1 > Root inode 524291 > Parent Id 0 > Created Tue Apr 14 15:57:43 2020 > Comment > Inode space 1 > Maximum number of inodes 10000384 > Allocated inodes 10000384 > Permission change flag chmodAndSetacl > afm-associated Yes > Target nfs://DK_VPN/mnt/fs01/AFMFILESET > Mode single-writer > File Lookup Refresh Interval 30 (default) > File Open Refresh Interval 30 (default) > Dir Lookup Refresh Interval 60 (default) > Dir Open Refresh Interval 60 (default) > Async Delay 15 (default) > Last pSnapId 0 > Display Home Snapshots no > Number of Read Threads per Gateway 64 > Parallel Read Chunk Size 128 > Parallel Read Threshold 1024 > Number of Gateway Flush Threads 48 > Prefetch Threshold 0 (default) > Eviction Enabled yes (default) > Parallel Write Threshold 1024 > Parallel Write Chunk Size 128 > Number of Write Threads per Gateway 16 > IO Flags 0 (default) > > > mmfsadm dump afm: > AFM Gateway: > RpcQLen: 0 maxPoolSize: 4294967295 QOF: 0 MaxOF: 131072 > readThLimit 128 minIOBuf 1048576 maxIOBuf 1073741824 msgMaxWriteSize 2147483648 > readBypassThresh 67108864 > QLen: 0 QMem: 0 SoftQMem: 10737418240 HardQMem 26843545600 > Ping thread: Started > Fileset: AFMFILESET 1 (fs02) > mode: single-writer queue: Normal MDS: QMem 0 CTL 577 > home: DK_VPN homeServer: 10.110.5.11 proto: nfs port: 2049 lastCmd: 16 > handler: Mounted Dirty refCount: 1 > queueTransfer: state: Idle senderVerified: 0 receiverVerified: 1 terminate: 0 psnapWait: 0 > remoteAttrs: AsyncLookups 0 tsfindinode: success 0 failed 0 totalTime 0.0 avgTime 0,000000 maxTime 0.0 > queue: delay 15 QLen 0+0 flushThds 0 maxFlushThds 48 numExec 8772518 qfs 0 iwo 0 err 78 > handlerCreateTime : 2020-04-27_11:14:57.415+0200 numCreateSnaps : 0 InflightAsyncLookups 0 > lastReplayTime : 2020-04-28_07:22:32.415+0200 lastSyncTime : 2020-04-27_15:09:57.415+0200 > i/o: readBuf: 33554432 writeBuf: 2097152 sparseReadThresh: 134217728 pReadThreads 64 > i/o: pReadChunkSize 33554432 pReadThresh: 1073741824 pWriteThresh: 1073741824 > i/o: prefetchThresh 0 (Prefetch) > Mnt status: 0:0 1:0 2:0 3:0 > Export Map: 10.110.5.10/ 10.110.5.11/ 10.110.5.12/ 10.110.5.13/ > Priority Queue: Empty (state: Active) > Normal Queue: Empty (state: Active) > > > Cluster Config Cache: > maxFilesToCache 131072 > maxStatCache 524288 > afmDIO 2 > afmIOFlags 4096 > maxReceiverThreads 32 > afmNumReadThreads 64 > afmNumWriteThreads 8 > afmHardMemThreshold 26843545600 > maxBufferDescs 163840 > afmMaxWriteMergeLen 32212254720 > workerThreads 1024 > > > The entries in the gpfs log states "AFM: Home is taking longer to respond..." but its only AFM and the Cache AFM fileset which enteres a locked state. we have the same NFS exports from home mounted on the same gateway nodes to check when a file is transferred and they are all ok while the AFM lock is happening. a simple gpfs restart of the AFM Master node is enough to make AFM restart and continue for another week.. > > > The home target is exported through CES NFS from 4 CES nodes and a map is created at the Cache site to utilize the ParallelWrites feature. > > > If there is anyone sitting around with some ideas/knowledge on how to tune this further for more stability then i would be happy if you could share your thoughts about it! :-) > > > Many Thanks in Advance! > Andi Christiansen > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From skylar2 at uw.edu Tue Apr 28 14:57:36 2020 From: skylar2 at uw.edu (Skylar Thompson) Date: Tue, 28 Apr 2020 06:57:36 -0700 Subject: [gpfsug-discuss] wait for mount during gpfs startup In-Reply-To: References: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de> Message-ID: <20200428135736.3zqcvvupj2ipvjfw@illiuin> We use callbacks successfully to ensure Linux auditd rules are only loaded after GPFS is mounted. It was easy to setup, and there's very fine-grained events that you can trigger on: https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/bl1adm_mmaddcallback.htm On Tue, Apr 28, 2020 at 11:30:38AM +0000, Frederick Stock wrote: > Have you looked a the mmaddcallback command and specifically the file system mount callbacks? -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From novosirj at rutgers.edu Tue Apr 28 17:33:34 2020 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Tue, 28 Apr 2020 16:33:34 +0000 Subject: [gpfsug-discuss] wait for mount during gpfs startup In-Reply-To: <792994589.3526899.1588074950106.JavaMail.zimbra@desy.de> References: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de> <277171dd-9297-98e4-1e18-352fb49ef96f@strath.ac.uk> <792994589.3526899.1588074950106.JavaMail.zimbra@desy.de> Message-ID: <2F49D93E-18CA-456D-9815-ACB581A646B7@rutgers.edu> Has anyone confirmed this? At one point, I mucked around with this somewhat endlessly to try to get something sane and systemd-based to work and ultimately surrendered and inserted a 30 second delay. I didn?t try the ?check for the presence of a file? thing as I?m allergic to that sort of thing (at least more allergic than I am to a time-based delay). I believe everything that I tried happens before the mount is complete. -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' > On Apr 28, 2020, at 7:55 AM, Hannappel, Juergen wrote: > > Hi, > a gpfs.mount target should be automatically created at boot by the > systemd-fstab-generator from the fstab entry, so no need with hackery like > ismountet.txt... > > > ----- Original Message ----- >> From: "Jonathan Buzzard" >> To: gpfsug-discuss at spectrumscale.org >> Sent: Tuesday, 28 April, 2020 13:38:01 >> Subject: Re: [gpfsug-discuss] wait for mount during gpfs startup > >> Yuck, and double yuck. There are many things you can say about systemd >> (and I have a choice few) but one of them is that it makes this sort of >> hackery obsolete. At least that is one of it goals. >> >> A systemd way to do it would be via one or more helper units. So lets >> assume your GPFS file system is mounted on /gpfs, then create a file >> called ismounted.txt on it and then create a unit called say >> gpfs_mounted.target that looks like >> >> >> # gpfs_mounted.target >> [Unit] >> TimeoutStartSec=infinity >> ConditionPathExists=/gpfs/ismounted.txt >> ExecStart=/usr/bin/sleep 10 >> RemainAfterExit=yes >> >> Then the main unit gets >> >> Wants=gpfs_mounted.target >> After=gpfs_mounted.target >> >> If you are using scripts in systemd you are almost certainly doing it >> wrong :-) >> >> JAB. >> >> -- >> Jonathan A. Buzzard Tel: +44141-5483420 >> HPC System Administrator, ARCHIE-WeSt. >> University of Strathclyde, John Anderson Building, Glasgow. G4 0NG >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From novosirj at rutgers.edu Tue Apr 28 18:32:25 2020 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Tue, 28 Apr 2020 17:32:25 +0000 Subject: [gpfsug-discuss] wait for mount during gpfs startup (Ulrich Sibiller) In-Reply-To: References: Message-ID: I?ve also voted and commented on the ticket, but I?ll say this here: If the amount of time I spent on this alone (and I like to think I?m pretty good with this sort of thing, and am somewhat of a systemd evangelist when the opportunity presents itself), this has caused a lot of people a lot of pain ? including time spent when their kludge to make this work causes some other problem, or having to reboot nodes in a much more manual way at times to ensure one of these nodes doesn?t dump work while it has no FS, etc. -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' > On Apr 28, 2020, at 8:10 AM, Carl Zetie - carlz at us.ibm.com wrote: > > There?s an RFE related to this: RFE 125955 (https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=125955) > > I recommend that people add their votes and comments there as well as discussing it here in the UG. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From chair at spectrumscale.org Wed Apr 29 22:29:34 2020 From: chair at spectrumscale.org (Simon Thompson (Spectrum Scale User Group Chair)) Date: Wed, 29 Apr 2020 22:29:34 +0100 Subject: [gpfsug-discuss] THINK Virtual User Group Day Message-ID: <5BE5B210-5FEE-45E0-AC0D-1B184B5B8E45@spectrumscale.org> Hi All, As part of IBM?s THINK digital event, there will be a virtual user group day. This isn?t an SSUG event, though we?ve been involved in some of the discussion about the topics for the event. Three of the four Storage sessions are focussed on Spectrum Scale. For storage this will be taking place on May 19th. Details of how to register for this event and the planned sessions are below (though I guess are still subject to change). Separately to this, the SSUG organisers are still in discussion about how we might present some sort of digital SSUG event, it won?t be a half/full day of talks, but likely a series of talks ? but we?re still working through the details with Ulf and the IBM team about how it might work. And if you are interested in THINK, this is free to register for this year as a digital only event https://www.ibm.com/events/think ? I promise this is my only reference to THINK ? Simon The registration site for the user group day is https://ibm-usergroups.bemyapp.com/ Storage Session 1 Title IBM Spectrum Scale: Use Cases and Field Lessons-learned with Kubernetes and OpenShift Abstract IBM Spectrum Scale user group leaders will discuss how to deploy IBM Spectrum Scale using Kubernetes and OpenShift, persistent volumes, IBM Storage Enabler for Containers, Kubernetes FlexVolume Drivers and IBM Spectrum Connect. We'll review real-world IBM Spectrum Scale use cases including advanced driver assistance systems (ADAS), cloud service providers (CSP), dev/test and multi-cloud. We'll also review most often-requested client topics including unsupported CSI platforms, security, multi-tenancy and how to deploy Spectrum Scale in heterogenous environments such as x86, IBM Power, and IBM Z by using IBM Cloud Private and OpenShift. Finally we'll describe IBM resources such as regional storage competency centers, training, testing labs and IBM Lab Services. Presenter Harald Seipp, Senior Technical Staff Member, Center of Excellence for Cloud Storage Storage Session 2 Title How to Efficiently Manage your Hadoop and Analytics Workflow with IBM Spectrum Scale Abstract This in-depth technical talk will compare traditional Hadoop vs. IBM Spectrum Scale through Hadoop Distributed File System (HDFS) on IBM Spectrum Scale, HDFS storage tiering & federation, HDFS backup, using IBM Spectrum Scale as an ingest tier, next generation workloads, disaster recovery and fault-tolerance using a single stretch cluster or multiple clusters using active file management (AFM), as well as HDFS integration within Cluster Export Services (CES). Presenter Andreas Koeninger, IBM Spectrum Scale Big Data and Analytics Storage Session 3 Title IBM Spectrum Scale: How to enable AI Workloads with OpenShift and IBM Spectrum Scale Abstract IBM Spectrum Scale user group leaders will deliver a in-depth technical presentation covering the enterprise AI data pipeline from ingest to insights, how to manage workloads at scale, how to integrate OpenShift 4.x and IBM Spectrum Scale 5.0.4.1, as well as preparing and installing the IBM Spectrum Scale CSI driver in OpenShift. We will also cover Kubernetes/OpenShift persistent volumes and use cases for provisioning with IBM Spectrum Scale CSI for AI workloads. Finally we will feature a demo of IBM Spectrum Scale CSI and TensorFlow in OpenShift 4.x. Presenters Gero Schmidt, IBM Spectrum Scale Development, Big Data Analytics Solutions Przemyslaw Podfigurny, IBM Spectrum Scale Development, AI/ML Big Data and Analytics Storage Session 4 Title Journey to Modern Data Protection for a Large Manufacturing Client Abstract In this webinar, we will discuss how industrial manufacturing organizations are addressing data protection. We will look at why holistic data protection is a critical infrastructure component and how modernization can provide a foundation for the future. We will share how customers are leveraging the IBM Spectrum Protect portfolio to address their IT organization's data protection, business continuity with software-defined data protection solutions. We will discuss various applications including data reuse, as well as providing instant access to data which can help an organization be more agile and reduce downtime. Presenters Adam Young, Russell Dwire -------------- next part -------------- An HTML attachment was scrubbed... URL: From u.sibiller at science-computing.de Thu Apr 30 11:50:27 2020 From: u.sibiller at science-computing.de (Ulrich Sibiller) Date: Thu, 30 Apr 2020 12:50:27 +0200 Subject: [gpfsug-discuss] wait for mount during gpfs startup In-Reply-To: <20200428135736.3zqcvvupj2ipvjfw@illiuin> References: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de> <20200428135736.3zqcvvupj2ipvjfw@illiuin> Message-ID: Am 28.04.20 um 15:57 schrieb Skylar Thompson: >> Have you looked a the mmaddcallback command and specifically the file system mount callbacks? > We use callbacks successfully to ensure Linux auditd rules are only loaded > after GPFS is mounted. It was easy to setup, and there's very fine-grained > events that you can trigger on: Thanks. But how do set this up for a systemd service? Disable the dependent service and start it from the callback? Create some kind of state file in the callback and let the dependent systemd service check that flag file in a busy loop? Use inotify for the flag file? Uli -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 From u.sibiller at science-computing.de Thu Apr 30 11:50:39 2020 From: u.sibiller at science-computing.de (Ulrich Sibiller) Date: Thu, 30 Apr 2020 12:50:39 +0200 Subject: [gpfsug-discuss] wait for mount during gpfs startup In-Reply-To: <792994589.3526899.1588074950106.JavaMail.zimbra@desy.de> References: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de> <277171dd-9297-98e4-1e18-352fb49ef96f@strath.ac.uk> <792994589.3526899.1588074950106.JavaMail.zimbra@desy.de> Message-ID: <4c9f3acc-cfc7-05a5-eca5-2054c67c0cc4@science-computing.de> Am 28.04.20 um 13:55 schrieb Hannappel, Juergen: > a gpfs.mount target should be automatically created at boot by the > systemd-fstab-generator from the fstab entry, so no need with hackery like > ismountet.txt... A generic gpfs.mount target does not seem to exist on my system. There are only specific mount targets for the mounted gpfs filesystems. So I'd need to individually configure each depend service on each system with the filesystem for wait for. My approach was more general in just waiting for all_local gpfs filesystems. So I can use the same configuration everywhere. Besides, I have once tested and found that these targets are not usable because of some oddities but unfortunately I don't remember details. But the outcome was my script from the initial post. Maybe it was that there's no automatic mount target for all_local, same problem as above. Uli -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 From u.sibiller at science-computing.de Thu Apr 30 12:14:07 2020 From: u.sibiller at science-computing.de (Ulrich Sibiller) Date: Thu, 30 Apr 2020 13:14:07 +0200 Subject: [gpfsug-discuss] wait for mount during gpfs startup In-Reply-To: <277171dd-9297-98e4-1e18-352fb49ef96f@strath.ac.uk> References: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de> <277171dd-9297-98e4-1e18-352fb49ef96f@strath.ac.uk> Message-ID: Am 28.04.20 um 13:38 schrieb Jonathan Buzzard: > Yuck, and double yuck. There are many things you can say about systemd > (and I have a choice few) but one of them is that it makes this sort of > hackery obsolete. At least that is one of it goals. > > A systemd way to do it would be via one or more helper units. So lets > assume your GPFS file system is mounted on /gpfs, then create a file > called ismounted.txt on it and then create a unit called say > gpfs_mounted.target that looks like > > > # gpfs_mounted.target > [Unit] > TimeoutStartSec=infinity > ConditionPathExists=/gpfs/ismounted.txt > ExecStart=/usr/bin/sleep 10 > RemainAfterExit=yes > > Then the main unit gets > > Wants=gpfs_mounted.target > After=gpfs_mounted.target > > If you are using scripts in systemd you are almost certainly doing it > wrong :-) Yes, that the right direction. But still not the way I'd like it to be. First, I don't really like the flag file stuff. Imagine the mess you'd create if multiple services would require flag files... Second, I am looking for an all_local target. That one cannot be solved using this approach, right? (same for all_remote or all) Uli -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 From scale at us.ibm.com Thu Apr 30 12:40:57 2020 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Thu, 30 Apr 2020 07:40:57 -0400 Subject: [gpfsug-discuss] wait for mount during gpfs startup In-Reply-To: References: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de><20200428135736.3zqcvvupj2ipvjfw@illiuin> Message-ID: I now better understand the functionality you were aiming to achieve. You want anything in systemd that is dependent on GPFS file systems being mounted to block until they are mounted. Currently we do not offer any such feature though as Carl Zetie noted there is an RFE for such functionality, RFE 125955 ( https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=125955 ). For the mmaddcallback what I was thinking could resolve your problem was for you to create a either a "startup" callback or "mount" callbacks for your file systems. I thought you could use those callbacks to track the file systems of interest and then use the appropriate means to integrate that information into the flow of systemd. I have never done this so perhaps it is not possible. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Ulrich Sibiller To: gpfsug-discuss at spectrumscale.org Date: 04/30/2020 06:57 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] wait for mount during gpfs startup Sent by: gpfsug-discuss-bounces at spectrumscale.org Am 28.04.20 um 15:57 schrieb Skylar Thompson: >> Have you looked a the mmaddcallback command and specifically the file system mount callbacks? > We use callbacks successfully to ensure Linux auditd rules are only loaded > after GPFS is mounted. It was easy to setup, and there's very fine-grained > events that you can trigger on: Thanks. But how do set this up for a systemd service? Disable the dependent service and start it from the callback? Create some kind of state file in the callback and let the dependent systemd service check that flag file in a busy loop? Use inotify for the flag file? Uli -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=KmkFZ30Ey3pB4QnhsP2vS2mmojVLAWGrIiStGaE0320&s=VHWoLbiq119iFhL724WAQwg4dSJ3KRNVSXnfrFBv9RQ&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From skylar2 at uw.edu Thu Apr 30 14:43:28 2020 From: skylar2 at uw.edu (Skylar Thompson) Date: Thu, 30 Apr 2020 06:43:28 -0700 Subject: [gpfsug-discuss] wait for mount during gpfs startup In-Reply-To: References: <3b659f74-07d7-f49f-51c3-dae1f65966b6@science-computing.de> <20200428135736.3zqcvvupj2ipvjfw@illiuin> Message-ID: <20200430134328.7qshqlrptw6hquls@illiuin> On Thu, Apr 30, 2020 at 12:50:27PM +0200, Ulrich Sibiller wrote: > Am 28.04.20 um 15:57 schrieb Skylar Thompson: > >> Have you looked a the mmaddcallback command and specifically the file system mount callbacks? > > > We use callbacks successfully to ensure Linux auditd rules are only loaded > > after GPFS is mounted. It was easy to setup, and there's very fine-grained > > events that you can trigger on: > > Thanks. But how do set this up for a systemd service? Disable the dependent service and start it > from the callback? Create some kind of state file in the callback and let the dependent systemd > service check that flag file in a busy loop? Use inotify for the flag file? In the pre-systemd days, I would say just disable the service and let the callback handle it. I do see your point, though, that you lose the other systemd ordering benefits if you start the service from the callback. Assuming you're still able to start the service via systemctl, I would probably just leave it disabled and let the callback handle it. In the case of auditd rules, it's not actually a service (just a command that needs to be run) so we didn't run into this specific problem. -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine