From andreas.mattsson at maxiv.lu.se Tue Oct 1 07:33:35 2019 From: andreas.mattsson at maxiv.lu.se (Andreas Mattsson) Date: Tue, 1 Oct 2019 06:33:35 +0000 Subject: [gpfsug-discuss] afmRefreshAsync questions In-Reply-To: References: , Message-ID: Hi, I've tried increasing all the refresh intervals, but even at 300 seconds, there is very little performance increase. The job runs in several steps, and gets held up at two places, as far as I can see. First at a kind of parallelisation step where about 1000-3000 files are created in the current working folder on a single compute node, and then at a step where lots of small output files are written on each of the compute nodes involved in the job. Comparing with running the same data set on a non-AFM cache fileset in the same storage system, it runs at least a factor 5 slower, even with really high refresh intervals. In the Scale documentation, it states that the afmRefreshAsync is only configurable cluster wide. Is it also configurable on a per-fileset level? https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.3/com.ibm.spectrum.scale.v5r03.doc/bl1adm_configurationparametersAFM.htm The software is XDS, http://xds.mpimf-heidelberg.mpg.de/ Unfortunately it is a closed source software, so it is not possible to adapt the software. Regards, Andreas Mattsson ____________________________________________ [X] Andreas Mattsson Systems Engineer MAX IV Laboratory Lund University P.O. Box 118, SE-221 00 Lund, Sweden Visiting address: Fotongatan 2, 224 84 Lund Mobile: +46 706 64 95 44 andreas.mattsson at maxiv.lu.se www.maxiv.se ________________________________ Fr?n: gpfsug-discuss-bounces at spectrumscale.org f?r Venkateswara R Puvvada Skickat: den 27 september 2019 10:23:13 Till: gpfsug main discussion list ?mne: Re: [gpfsug-discuss] afmRefreshAsync questions Hi, Both storage and client clusters have to be on 5.0.3.x to get the AFM revalidation performance with afmRefreshAsync. What are the refresh intervals ?, you could also try increasing them. Is this config option set at fileset level or cluster level ? ~Venkat (vpuvvada at in.ibm.com) From: Andreas Mattsson To: GPFS User Group Date: 09/26/2019 03:26 PM Subject: [EXTERNAL] [gpfsug-discuss] afmRefreshAsync questions Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi, Due to having a data analysis software that isn't running well at all in our AFM caches, it runs 4-6 times slower on an AFM cache than on a non-AFM fileset on the same storage system, I wanted to try out the afmRefreshAsync feature that came with 5.0.3 to see if it is the cache data refresh that is holding things up. Enabling this feature has had zero impact on performance of the software though. The storage cluster is running 5.0.3.x, and afmRefreshAsync has been set there, but at the moment the remote-mounting client cluster is still running 5.0.2.x. Would this feature still have any effect in this setup? Regards, Andreas Mattsson ____________________________________________ [cid:_4_DB7D1BA8DB7D1920002E115D65258482] Andreas Mattsson Systems Engineer MAX IV Laboratory Lund University P.O. Box 118, SE-221 00 Lund, Sweden Visiting address: Fotongatan 2, 224 84 Lund Mobile: +46 706 64 95 44 andreas.mattsson at maxiv.lu.se www.maxiv.se _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00001.png Type: image/png Size: 4232 bytes Desc: ATT00001.png URL: From leonardo.sala at psi.ch Tue Oct 1 08:03:05 2019 From: leonardo.sala at psi.ch (Leonardo Sala) Date: Tue, 1 Oct 2019 09:03:05 +0200 Subject: [gpfsug-discuss] Filesystem access issues via CES NFS In-Reply-To: References: <717f49aade0b439eb1b99fc620a21cac@maxiv.lu.se> <9456645b0a1f4b488b13874ea672b9b8@maxiv.lu.se> Message-ID: Dear all, we have similar issues on our CES cluster, and we do have 5.0.2-1. Could anybody from IBM confirm that with 5.0.2-2 this issue should not be there anymore? Should we go for 5.0.2-2 or is there a better release? One thing we noticed: when we had the "empty ls" issue, which means: - on CES NFSv3 export, a directory is wrongly reported as empty, while - on kernel NFS export, this does not happen if I do an ls on that directory on the CES export node, then magically the empty dir issue disappears from all NFS clients, at least the ones attached on that node. Is this compatible with the behaviour described on the other sites? thanks! cheers leo Paul Scherrer Institut Dr. Leonardo Sala Group Leader High Performance Computing Deputy Section Head Science IT Science IT WHGA/106 5232 Villigen PSI Switzerland Phone: +41 56 310 3369 leonardo.sala at psi.ch www.psi.ch On 04.01.19 10:09, Andreas Mattsson wrote: > > Just reporting back that the issue we had seems to have been solved. > In our case it was fixed by applying hotfix-packages from IBM. Did > this in December and I can no longer trigger the issue. Hopefully, > it'll stay fixed when we get full production load on the system again > now in January. > > Also, as far as I can see, it looks like Scale 5.0.2.2 includes these > packages already. > > > Regards, > > Andreas mattsson > -------------- next part -------------- An HTML attachment was scrubbed... URL: From heinrich.billich at id.ethz.ch Tue Oct 1 13:34:38 2019 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Tue, 1 Oct 2019 12:34:38 +0000 Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? In-Reply-To: <766AA5C3-46BD-4B91-9D1E-52BC5FAB90A8@id.ethz.ch> References: <766AA5C3-46BD-4B91-9D1E-52BC5FAB90A8@id.ethz.ch> Message-ID: Hello, I wanted to completely disable IPv6 to get ganesha to use IPv4 sockets only. Once we did set the sysctl configs to disable IPv6 *and* did rebuild the initramfs.*.img file to include the new settings IPv6 was completely gone and ganesha did open an IPv4 socket only. We missed to rebuild the initramfs.*.img file in the first trial. Rpcbind/ganesha failed to start without the initramfs rebuild. Cheers, Heiner Some related documents from netapp https://access.redhat.com/solutions/8709#?rhel7disable https://access.redhat.com/solutions/2798411 https://access.redhat.com/solutions/2963091 From: on behalf of "Billich Heinrich Rainer (ID SD)" Reply to: gpfsug main discussion list Date: Monday, 16 September 2019 at 17:51 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? Hello Olaf, Thank you, so we?ll try to get rid of IPv6. Actually we do have this settings active but I may have to add them to the initrd file, too. (See https://access.redhat.com/solutions/8709#?rhel7disable) to prevent ganesha from opening an IPv6 socket. It?s probably no big issue if ganesha uses IPv4overIPv6 for all connections, but to keep things simple I would like to avoid it. @Edward We got /etc/tuned/scale/tuned.conf with GSS/xCAT. I?m not sure whether it?s part of any rpm. Cheers, Heiner From: on behalf of Olaf Weiser Reply to: gpfsug main discussion list Date: Monday, 16 September 2019 at 09:12 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? Hallo Heiner, usually, Spectrum Scale comes with a tuned profile (named scale) .. [root at nsd01 ~]# tuned-adm active Current active profile: scale in there [root at nsd01 ~]# cat /etc/tuned/scale/tuned.conf | tail -3 # Disable IPv6 net.ipv6.conf.all.disable_ipv6=1 net.ipv6.conf.default.disable_ipv6=1 [root at nsd01 ~]# depending on .... what you need to achieve .. one might be forced to changed that.. e.g. for RoCE .. you need IPv6 to be active ... but for all other scenarios with SpectrumScale (at least what I'm aware of right now) ... IPv6 can be disabled... From: "Billich Heinrich Rainer (ID SD)" To: gpfsug main discussion list Date: 09/13/2019 05:02 PM Subject: [EXTERNAL] [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello, I just noted that our ganesha daemons offer IPv6 sockets only, IPv4 traffic gets encapsulated. But all traffic to samba is IPv4, smbd offers both IPv4 and IPv6 sockets. I just wonder whether this is to be expected? Protocols support IPv4 only, so why running on IPv6 sockets only for ganesha? Did we configure something wrong and should completely disable IPv6 on the kernel level Any comment is welcome Cheers, Heiner -- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== I did check with ss -l -t -4 ss -l -t -6 add -p to get the process name, too. do you get the same results on your ces nodes? [root at nas22ces04-i config_samples]# ss -l -t -4 State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 8192 *:gpfs *:* LISTEN 0 50 *:netbios-ssn *:* LISTEN 0 128 *:5355 *:* LISTEN 0 128 *:sunrpc *:* LISTEN 0 128 *:ssh *:* LISTEN 0 100 127.0.0.1:smtp *:* LISTEN 0 10 10.250.135.24:4379 *:* LISTEN 0 128 *:32765 *:* LISTEN 0 50 *:microsoft-ds *:* [root at nas22ces04-i config_samples]# ss -l -t -6 State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 128 :::32767 :::* LISTEN 0 128 :::32768 :::* LISTEN 0 128 :::32769 :::* LISTEN 0 128 :::2049 :::* LISTEN 0 128 :::5355 :::* LISTEN 0 50 :::netbios-ssn :::* LISTEN 0 128 :::sunrpc :::* LISTEN 0 128 :::ssh :::* LISTEN 0 128 :::32765 :::* LISTEN 0 50 :::microsoft-ds :::* _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Oct 1 16:15:00 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 1 Oct 2019 15:15:00 +0000 Subject: [gpfsug-discuss] verbsPortsOutOfOrder Message-ID: <139F1B36-A1EE-4D3C-A50A-1F15D8BCD242@bham.ac.uk> Hi, In mmdiag --config, we see ?verbsPortsOutOfOrder? as an unset option. Could anyone comment on what that might do and if it relates to the ordering that ?verbsPorts? are set? Thanks Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From TOMP at il.ibm.com Wed Oct 2 11:53:59 2019 From: TOMP at il.ibm.com (Tomer Perry) Date: Wed, 2 Oct 2019 13:53:59 +0300 Subject: [gpfsug-discuss] verbsPortsOutOfOrder In-Reply-To: <139F1B36-A1EE-4D3C-A50A-1F15D8BCD242@bham.ac.uk> References: <139F1B36-A1EE-4D3C-A50A-1F15D8BCD242@bham.ac.uk> Message-ID: Simon, It looks like its setting the Out Of Order MLX5 environmental parameter: https://docs.mellanox.com/display/MLNXOFEDv451010/Out-of-Order+%28OOO%29+Data+Placement+Experimental+Verbs Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 01/10/2019 18:17 Subject: [EXTERNAL] [gpfsug-discuss] verbsPortsOutOfOrder Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, In mmdiag --config, we see ?verbsPortsOutOfOrder? as an unset option. Could anyone comment on what that might do and if it relates to the ordering that ?verbsPorts? are set? Thanks Simon_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=mLPyKeOa1gNDrORvEXBgMw&m=dwtQhITjaULogq0l7wR3LfWDiy4R6tpPWq81EvnuA_o&s=LyZT2j0hkAP9pJTkYU40ZkexzkG6RFRqDcS9rSrapRc&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Wed Oct 2 18:02:06 2019 From: david_johnson at brown.edu (David Johnson) Date: Wed, 2 Oct 2019 13:02:06 -0400 Subject: [gpfsug-discuss] CIFS protocol access does not honor secondary groups Message-ID: After converting from clustered CIFS to CES protocols, we?ve noticed that SMB users can?t access files owned by groups that they are members of, unless that group happens to be their primary group. Have read the smb.conf man page, and don?t see anything obvious that would control this? What might we be missing? Thanks, ? ddj Dave Johnson Brown University CCV/CIS From frederik.ferner at diamond.ac.uk Wed Oct 2 19:41:14 2019 From: frederik.ferner at diamond.ac.uk (Frederik Ferner) Date: Wed, 2 Oct 2019 19:41:14 +0100 Subject: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is this unusual? In-Reply-To: <9D53BE88-A5FC-469F-9362-F2EC67E393B7@id.ethz.ch> References: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch> <9D53BE88-A5FC-469F-9362-F2EC67E393B7@id.ethz.ch> Message-ID: <0a5f042f-2715-c436-34a1-27c0ba529a70@diamond.ac.uk> Hello Heiner, very interesting, thanks. In our case we are seeing this problem on gpfs.nfs-ganesha-gpfs-2.5.3-ibm036.05.el7, so close to the version where you're seeing it. Frederik On 23/09/2019 10:33, Billich Heinrich Rainer (ID SD) wrote: > Hello Frederik, > > Thank you. I now see a similar behavior: Ganesha has 500k open files while the node is suspended since 2+hours. I would expect that some cleanup job does remove most of the open FD after a much shorter while. Our systems have an upper limit of 1M open files per process and these spectrum scale settings: > > ! maxFilesToCache 1048576 > ! maxStatCache 2097152 > > Our ganesha version is 2.5.3. (gpfs.nfs-ganesha-2.5.3-ibm036.10.el7). I don't see the issue with gpfs.nfs-ganesha-2.5.3-ibm030.01.el7. But this second cluster also has a different load pattern. > > I did also post my initial question to the ganesha mailing list and want to share the reply I've got from Daniel Gryniewicz. > > Cheers, > Heiner > > Daniel Gryniewicz > So, it's not impossible, based on the workload, but it may also be a bug. > > For global FDs (All NFSv3 and stateless NFSv4), we obviously cannot know > when the client closes the FD, and opening/closing all the time causes a > large performance hit. So, we cache open FDs. > > All handles in MDCACHE live on the LRU. This LRU is divided into 2 > levels. Level 1 is more active handles, and they can have open FDs. > Various operation can demote a handle to level 2 of the LRU. As part of > this transition, the global FD on that handle is closed. Handles that > are actively in use (have a refcount taken on them) are not eligible for > this transition, as the FD may be being used. > > We have a background thread that runs, and periodically does this > demotion, closing the FDs. This thread runs more often when the number > of open FDs is above FD_HwMark_Percent of the available number of FDs, > and runs constantly when the open FD count is above FD_Limit_Percent of > the available number of FDs. > > So, a heavily used server could definitely have large numbers of FDs > open. However, there have also, in the past, been bugs that would > either keep the FDs from being closed, or would break the accounting (so > they were closed, but Ganesha still thought they were open). You didn't > say what version of Ganesha you're using, so I can't tell if one of > those bugs apply. > > Daniel > > ?On 19.09.19, 16:37, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Frederik Ferner" wrote: > > Heiner, > > we are seeing similar issues with CES/ganesha NFS, in our case it > exclusively with NFSv3 clients. > > What is maxFilesToCache set to on your ganesha node(s)? In our case > ganesha was running into the limit of open file descriptors because > maxFilesToCache was set at a low default and for now we've increased it > to 1M. > > It seemed that ganesha was never releasing files even after clients > unmounted the file system. > > We've only recently made the change, so we'll see how much that improved > the situation. > > I thought we had a reproducer but after our recent change, I can now no > longer successfully reproduce the increase in open files not being released. > > Kind regards, > Frederik > > On 19/09/2019 15:20, Billich Heinrich Rainer (ID SD) wrote: > > Hello, > > > > Is it usual to see 200?000-400?000 open files for a single ganesha > > process? Or does this indicate that something ist wrong? > > > > We have some issues with ganesha (on spectrum scale protocol nodes) > > reporting NFS3ERR_IO in the log. I noticed that the affected nodes > > have a large number of open files, 200?000-400?000 open files per daemon > > (and 500 threads and about 250 client connections). Other nodes have > > 1?000 ? 10?000 open files by ganesha only and don?t show the issue. > > > > If someone could explain how ganesha decides which files to keep open > > and which to close that would help, too. As NFSv3 is stateless the > > client doesn?t open/close a file, it?s the server to decide when to > > close it? We do have a few NFSv4 clients, too. > > > > Are there certain access patterns that can trigger such a large number > > of open file? Maybe traversing and reading a large number of small files? > > > > Thank you, > > > > Heiner > > > > I did count the open files by counting the entries in /proc/ > ganesha>/fd/ . With several 100k entries I failed to do a ?ls -ls? to > > list all the symbolic links, hence I can?t relate the open files to > > different exports easily. > > > > I did post this to the ganesha mailing list, too. > > > > -- > > > > ======================= > > > > Heinrich Billich > > > > ETH Z?rich > > > > Informatikdienste > > > > Tel.: +41 44 632 72 56 > > > > heinrich.billich at id.ethz.ch > > > > ======================== > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > -- > This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. > Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. > Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. > Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Frederik Ferner Senior Computer Systems Administrator (storage) phone: +44 1235 77 8624 Diamond Light Source Ltd. mob: +44 7917 08 5110 Duty Sys Admin can be reached on x8596 (Apologies in advance for the lines below. Some bits are a legal requirement and I have no control over them.) From kkr at lbl.gov Thu Oct 3 01:01:39 2019 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Wed, 2 Oct 2019 17:01:39 -0700 Subject: [gpfsug-discuss] Slides from last US event, hosting and speaking at events and plan for next events Message-ID: Hi all, The slides from the UG event at NERSC/LBNL are making there way here: https://www.spectrumscaleug.org/presentations/ Most of them are already in place. Thanks to all who attended, presented and participated. It?s great when we have interactive discussions at these events. We?d like to ask you, as GPFS/Spectrum Scale users, to consider hosting a future UG event at your site or giving a site update. I?ve been asked *many times*, why aren?t there more site updates? So you tell me?is there a barrier that I?m not aware of? We?re a friendly group (really!) and want to hear about your successes and your failures. We all learn from each other. Let me know if you have any thoughts about this. As a reminder, there is an upcoming Australian event and 2 upcoming US events Australia ? Sydney October 18th https://www.spectrumscaleug.org/event/spectrum-scale-user-group-at-ibm-systems-technical-university-australia/ US ? NYC October 10th https://www.spectrumscaleug.org/event/spectrum-scale-nyc-user-meeting-2019/ ? SC19 at Denver November 17th - This year we will include a morning session for new users and lunch. Online agenda will be available soon. https://www.spectrumscaleug.org/event/spectrum-scale-user-group-meeting-sc19/ Any feedback for the agendas for these events, or in general, please let us know. Cheers, Kristy -------------- next part -------------- An HTML attachment was scrubbed... URL: From ruben.cremades at roche.com Thu Oct 3 08:18:03 2019 From: ruben.cremades at roche.com (Cremades, Ruben) Date: Thu, 3 Oct 2019 09:18:03 +0200 Subject: [gpfsug-discuss] verbsPortsOutOfOrder In-Reply-To: References: <139F1B36-A1EE-4D3C-A50A-1F15D8BCD242@bham.ac.uk> Message-ID: Thanks Tomer, I have opened TS002806998 Regards Ruben On Wed, Oct 2, 2019 at 12:54 PM Tomer Perry wrote: > Simon, > > It looks like its setting the Out Of Order MLX5 environmental parameter: > > *https://docs.mellanox.com/display/MLNXOFEDv451010/Out-of-Order+%28OOO%29+Data+Placement+Experimental+Verbs* > > > > > Regards, > > Tomer Perry > Scalable I/O Development (Spectrum Scale) > email: tomp at il.ibm.com > 1 Azrieli Center, Tel Aviv 67021, Israel > Global Tel: +1 720 3422758 > Israel Tel: +972 3 9188625 > Mobile: +972 52 2554625 > > > > > From: Simon Thompson > To: "gpfsug-discuss at spectrumscale.org" < > gpfsug-discuss at spectrumscale.org> > Date: 01/10/2019 18:17 > Subject: [EXTERNAL] [gpfsug-discuss] verbsPortsOutOfOrder > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi, > > In mmdiag --config, we see ?verbsPortsOutOfOrder? as an unset option. > Could anyone comment on what that might do and if it relates to the > ordering that ?verbsPorts? are set? > > Thanks > > Simon_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Rub?n Cremades Science Infrastructure F.Hoffmann-La Roche Ltd. Bldg 254 / Room 04 - NBH01 Wurmisweg 4303 - Kaiseraugst Phone: +41-61-687 26 25 ruben.cremades at roche.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Oct 3 10:14:01 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 3 Oct 2019 09:14:01 +0000 Subject: [gpfsug-discuss] verbsPortsOutOfOrder In-Reply-To: References: <139F1B36-A1EE-4D3C-A50A-1F15D8BCD242@bham.ac.uk> Message-ID: Thanks Tomer. That makes sense, also not something I think we need to worry about ? I assume that relates to hypercube or dragonfly or some such though the Mellanox docs only say ?some topologies? Simon From: on behalf of "TOMP at il.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Wednesday, 2 October 2019 at 11:54 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] verbsPortsOutOfOrder Simon, It looks like its setting the Out Of Order MLX5 environmental parameter: https://docs.mellanox.com/display/MLNXOFEDv451010/Out-of-Order+%28OOO%29+Data+Placement+Experimental+Verbs Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 01/10/2019 18:17 Subject: [EXTERNAL] [gpfsug-discuss] verbsPortsOutOfOrder Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi, In mmdiag --config, we see ?verbsPortsOutOfOrder? as an unset option. Could anyone comment on what that might do and if it relates to the ordering that ?verbsPorts? are set? Thanks Simon_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Oct 3 10:17:15 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 3 Oct 2019 09:17:15 +0000 Subject: [gpfsug-discuss] CIFS protocol access does not honor secondary groups In-Reply-To: References: Message-ID: This works for us, so it's something that should work. It's probably related to the way your authentication is setup, we used to use custom from before IBM supporting AD+LDAP and we had to add entries for the group SID in the LDAP server also, but since moving to "supported" way of doing this, we don't think we need this anymore.. You might want to do some digging with the wbinfo command and see if groups/SIDs resolve both ways, but I'd suggest opening a PMR on this. You could also check what file-permissions look like with mmgetacl. In the past we've seen some funkiness where creator/owner isn't on/inherited, so if the user owns the file/directory but the permission is to the group rather than directly the user, they can create new files but then not read them afterwards (though other users in the group can). I forget the exact details as we worked a standard inheritable ACL that works for us __ Simon ?On 02/10/2019, 18:02, "gpfsug-discuss-bounces at spectrumscale.org on behalf of David Johnson" wrote: After converting from clustered CIFS to CES protocols, we?ve noticed that SMB users can?t access files owned by groups that they are members of, unless that group happens to be their primary group. Have read the smb.conf man page, and don?t see anything obvious that would control this? What might we be missing? Thanks, ? ddj Dave Johnson Brown University CCV/CIS _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From TOMP at il.ibm.com Thu Oct 3 10:44:32 2019 From: TOMP at il.ibm.com (Tomer Perry) Date: Thu, 3 Oct 2019 12:44:32 +0300 Subject: [gpfsug-discuss] verbsPortsOutOfOrder In-Reply-To: References: <139F1B36-A1EE-4D3C-A50A-1F15D8BCD242@bham.ac.uk> Message-ID: Simon, I believe that adaptive routing might also introduce out of order packets - but I would ask Mellanox as to when they recommend to use it. Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Simon Thompson To: gpfsug main discussion list Date: 03/10/2019 12:14 Subject: [EXTERNAL] Re: [gpfsug-discuss] verbsPortsOutOfOrder Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Tomer. That makes sense, also not something I think we need to worry about ? I assume that relates to hypercube or dragonfly or some such though the Mellanox docs only say ?some topologies? Simon From: on behalf of "TOMP at il.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Wednesday, 2 October 2019 at 11:54 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] verbsPortsOutOfOrder Simon, It looks like its setting the Out Of Order MLX5 environmental parameter: https://docs.mellanox.com/display/MLNXOFEDv451010/Out-of-Order+%28OOO%29+Data+Placement+Experimental+Verbs Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 01/10/2019 18:17 Subject: [EXTERNAL] [gpfsug-discuss] verbsPortsOutOfOrder Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, In mmdiag --config, we see ?verbsPortsOutOfOrder? as an unset option. Could anyone comment on what that might do and if it relates to the ordering that ?verbsPorts? are set? Thanks Simon_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=mLPyKeOa1gNDrORvEXBgMw&m=rn4emIykuWgljnk6nj_Ay8TFU177BWp8qeaVAjmenfM&s=dO3QHcwm0oVHnHKGtdwIi2Q8mXWvL6JPmU7aVuRRMx0&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From andreas.mattsson at maxiv.lu.se Thu Oct 3 14:55:19 2019 From: andreas.mattsson at maxiv.lu.se (Andreas Mattsson) Date: Thu, 3 Oct 2019 13:55:19 +0000 Subject: [gpfsug-discuss] afmRefreshAsync questions In-Reply-To: References: , , Message-ID: <7476c598c32440f1bffe7d9e950c0965@maxiv.lu.se> After further investigaion, it seems like this XDS software is using memory mapped io when operating on the files. Is it possible that MMAP IO has a higher performance hit by AFM than regular file access? /Andreas ____________________________________________ [X] Andreas Mattsson Systems Engineer MAX IV Laboratory Lund University P.O. Box 118, SE-221 00 Lund, Sweden Visiting address: Fotongatan 2, 224 84 Lund Mobile: +46 706 64 95 44 andreas.mattsson at maxiv.lu.se www.maxiv.se ________________________________ Fr?n: gpfsug-discuss-bounces at spectrumscale.org f?r Andreas Mattsson Skickat: den 1 oktober 2019 08:33:35 Till: gpfsug main discussion list ?mne: Re: [gpfsug-discuss] afmRefreshAsync questions Hi, I've tried increasing all the refresh intervals, but even at 300 seconds, there is very little performance increase. The job runs in several steps, and gets held up at two places, as far as I can see. First at a kind of parallelisation step where about 1000-3000 files are created in the current working folder on a single compute node, and then at a step where lots of small output files are written on each of the compute nodes involved in the job. Comparing with running the same data set on a non-AFM cache fileset in the same storage system, it runs at least a factor 5 slower, even with really high refresh intervals. In the Scale documentation, it states that the afmRefreshAsync is only configurable cluster wide. Is it also configurable on a per-fileset level? https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.3/com.ibm.spectrum.scale.v5r03.doc/bl1adm_configurationparametersAFM.htm The software is XDS, http://xds.mpimf-heidelberg.mpg.de/ Unfortunately it is a closed source software, so it is not possible to adapt the software. Regards, Andreas Mattsson ____________________________________________ [X] Andreas Mattsson Systems Engineer MAX IV Laboratory Lund University P.O. Box 118, SE-221 00 Lund, Sweden Visiting address: Fotongatan 2, 224 84 Lund Mobile: +46 706 64 95 44 andreas.mattsson at maxiv.lu.se www.maxiv.se ________________________________ Fr?n: gpfsug-discuss-bounces at spectrumscale.org f?r Venkateswara R Puvvada Skickat: den 27 september 2019 10:23:13 Till: gpfsug main discussion list ?mne: Re: [gpfsug-discuss] afmRefreshAsync questions Hi, Both storage and client clusters have to be on 5.0.3.x to get the AFM revalidation performance with afmRefreshAsync. What are the refresh intervals ?, you could also try increasing them. Is this config option set at fileset level or cluster level ? ~Venkat (vpuvvada at in.ibm.com) From: Andreas Mattsson To: GPFS User Group Date: 09/26/2019 03:26 PM Subject: [EXTERNAL] [gpfsug-discuss] afmRefreshAsync questions Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi, Due to having a data analysis software that isn't running well at all in our AFM caches, it runs 4-6 times slower on an AFM cache than on a non-AFM fileset on the same storage system, I wanted to try out the afmRefreshAsync feature that came with 5.0.3 to see if it is the cache data refresh that is holding things up. Enabling this feature has had zero impact on performance of the software though. The storage cluster is running 5.0.3.x, and afmRefreshAsync has been set there, but at the moment the remote-mounting client cluster is still running 5.0.2.x. Would this feature still have any effect in this setup? Regards, Andreas Mattsson ____________________________________________ [cid:_4_DB7D1BA8DB7D1920002E115D65258482] Andreas Mattsson Systems Engineer MAX IV Laboratory Lund University P.O. Box 118, SE-221 00 Lund, Sweden Visiting address: Fotongatan 2, 224 84 Lund Mobile: +46 706 64 95 44 andreas.mattsson at maxiv.lu.se www.maxiv.se _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00001.png Type: image/png Size: 4232 bytes Desc: ATT00001.png URL: From christof.schmitt at us.ibm.com Thu Oct 3 17:02:17 2019 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Thu, 3 Oct 2019 16:02:17 +0000 Subject: [gpfsug-discuss] =?utf-8?q?CIFS_protocol_access_does_not_honor_se?= =?utf-8?q?condary=09groups?= In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From mnaineni at in.ibm.com Thu Oct 3 18:15:04 2019 From: mnaineni at in.ibm.com (Malahal R Naineni) Date: Thu, 3 Oct 2019 17:15:04 +0000 Subject: [gpfsug-discuss] Filesystem access issues via CES NFS In-Reply-To: References: , <717f49aade0b439eb1b99fc620a21cac@maxiv.lu.se><9456645b0a1f4b488b13874ea672b9b8@maxiv.lu.se> Message-ID: An HTML attachment was scrubbed... URL: From mnaineni at in.ibm.com Thu Oct 3 18:31:34 2019 From: mnaineni at in.ibm.com (Malahal R Naineni) Date: Thu, 3 Oct 2019 17:31:34 +0000 Subject: [gpfsug-discuss] Filesystem access issues via CES NFS In-Reply-To: References: , <717f49aade0b439eb1b99fc620a21cac@maxiv.lu.se><9456645b0a1f4b488b13874ea672b9b8@maxiv.lu.se> Message-ID: An HTML attachment was scrubbed... URL: From will.schmied at stjude.org Thu Oct 3 19:59:22 2019 From: will.schmied at stjude.org (Schmied, Will) Date: Thu, 3 Oct 2019 18:59:22 +0000 Subject: [gpfsug-discuss] Job: HPC Storage Architect at St. Jude Message-ID: <277C9DAD-06A2-4BD9-906F-83BFDDCDD965@stjude.org> Happy almost Friday everyone, St. Jude Children?s Research Hospital (Memphis, TN) has recently posted a job opening for a HPC Storage Architect, a senior level position working primarily to operate and maintain multiple Spectrum Scale clusters in support of research and other HPC workloads. You can view the job posting, and begin your application, here: http://myjob.io/nd6qd You can find all jobs, and information about working at St. Jude, here: https://www.stjude.org/jobs/hospital.html Please feel free to contact me directly off list if you have any questions. I?ll also be at SC this year and hope to see you there. Thanks, Will ________________________________ Email Disclaimer: www.stjude.org/emaildisclaimer Consultation Disclaimer: www.stjude.org/consultationdisclaimer -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaineni at in.ibm.com Fri Oct 4 06:49:35 2019 From: mnaineni at in.ibm.com (Malahal R Naineni) Date: Fri, 4 Oct 2019 05:49:35 +0000 Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? In-Reply-To: References: , <766AA5C3-46BD-4B91-9D1E-52BC5FAB90A8@id.ethz.ch> Message-ID: An HTML attachment was scrubbed... URL: From leonardo.sala at psi.ch Fri Oct 4 07:32:42 2019 From: leonardo.sala at psi.ch (Leonardo Sala) Date: Fri, 4 Oct 2019 08:32:42 +0200 Subject: [gpfsug-discuss] Filesystem access issues via CES NFS In-Reply-To: References: <717f49aade0b439eb1b99fc620a21cac@maxiv.lu.se> <9456645b0a1f4b488b13874ea672b9b8@maxiv.lu.se> Message-ID: <08c2be8b-ece1-a7a4-606f-e63fe854a8b1@psi.ch> Dear Malahal, thanks for the answer. Concerning SSSD, we are also using it, should we use 5.0.2-PTF3? We would like to avoid using 5.0.2.2, as it has issues with recent RHEL 7.6 kernels [*] and we are impacted: do you suggest to use 5.0.3.3? cheers leo [*] https://www.ibm.com/support/pages/ibm-spectrum-scale-gpfs-releases-42313-or-later-and-5022-or-later-have-issues-where-kernel-crashes-rhel76-0 Paul Scherrer Institut Dr. Leonardo Sala Group Leader High Performance Computing Deputy Section Head Science IT Science IT WHGA/106 5232 Villigen PSI Switzerland Phone: +41 56 310 3369 leonardo.sala at psi.ch www.psi.ch On 03.10.19 19:15, Malahal R Naineni wrote: > >> @Malahal: Looks like you have written the netgroup caching code, > feel free to ask for further details if required. > Hi Ulrich, Ganesha uses innetgr() call for netgroup information and > sssd has too many issues in its implementation. Redhat said that they > are going to fix sssd synchronization issues in RHEL8. It is in my > plate to serialize innergr() call in Ganesha to match kernel NFS > server usage! I expect the sssd issue to give EACCESS/EPERM kind of > issue but not EINVAL though. > If you are using sssd, you must be getting into a sssd issue. > Ganesha?has a host-ip cache fix in 5.0.2 PTF3. Please make sure you > use ganesha version?V2.5.3-ibm030.01 if you are using netgroups > (shipped with 5.0.2 PTF3 but can be used with Scale 5.0.1 or later) > Regards, Malahal. > > ----- Original message ----- > From: Ulrich Sibiller > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug-discuss at spectrumscale.org > Cc: > Subject: Re: [gpfsug-discuss] Filesystem access issues via CES NFS > Date: Thu, Dec 13, 2018 7:32 PM > On 23.11.2018 14:41, Andreas Mattsson wrote: > > Yes, this is repeating. > > > > We?ve ascertained that it has nothing to do at all with file > operations on the GPFS side. > > > > Randomly throughout the filesystem mounted via NFS, ls or file > access will give > > > > ? > > > > ?> ls: reading directory /gpfs/filessystem/test/testdir: Invalid > argument > > > > ? > > > > Trying again later might work on that folder, but might fail > somewhere else. > > > > We have tried exporting the same filesystem via a standard > kernel NFS instead of the CES > > Ganesha-NFS, and then the problem doesn?t exist. > > > > So it is definitely related to the Ganesha NFS server, or its > interaction with the file system. > > ?> Will see if I can get a tcpdump of the issue. > > We see this, too. We cannot trigger it. Fortunately I have managed > to capture some logs with > debugging enabled. I have now dug into the ganesha 2.5.3 code and > I think the netgroup caching is > the culprit. > > Here some FULL_DEBUG output: > 2018-12-13 11:53:41 : epoch 0009008d : server1 : > gpfs.ganesha.nfsd-258762[work-250] > export_check_access :EXPORT :M_DBG :Check for address 1.2.3.4 for > export id 1 path /gpfsexport > 2018-12-13 11:53:41 : epoch 0009008d : server1 : > gpfs.ganesha.nfsd-258762[work-250] client_match > :EXPORT :M_DBG :Match V4: 0xcf7fe0 NETGROUP_CLIENT: netgroup1 > (options=421021e2root_squash ? , RWrw, > 3--, ---, TCP, ----, Manage_Gids ? , -- Deleg, anon_uid= ?-2, > anon_gid= ? ?-2, sys) > 2018-12-13 11:53:41 : epoch 0009008d : server1 : > gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get > :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain > 2018-12-13 11:53:41 : epoch 0009008d : server1 : > gpfs.ganesha.nfsd-258762[work-250] client_match > :EXPORT :M_DBG :Match V4: 0xcfe320 NETGROUP_CLIENT: netgroup2 > (options=421021e2root_squash ? , RWrw, > 3--, ---, TCP, ----, Manage_Gids ? , -- Deleg, anon_uid= ?-2, > anon_gid= ? ?-2, sys) > 2018-12-13 11:53:41 : epoch 0009008d : server1 : > gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get > :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain > 2018-12-13 11:53:41 : epoch 0009008d : server1 : > gpfs.ganesha.nfsd-258762[work-250] client_match > :EXPORT :M_DBG :Match V4: 0xcfe380 NETGROUP_CLIENT: netgroup3 > (options=421021e2root_squash ? , RWrw, > 3--, ---, TCP, ----, Manage_Gids ? , -- Deleg, anon_uid= ?-2, > anon_gid= ? ?-2, sys) > 2018-12-13 11:53:41 : epoch 0009008d : server1 : > gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get > :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain > 2018-12-13 11:53:41 : epoch 0009008d : server1 : > gpfs.ganesha.nfsd-258762[work-250] > export_check_access :EXPORT :M_DBG :EXPORT ?(options=03303002 ? ? > ? ? ? ? ?, ? ? , ? ?, > ?? ? ?, ? ? ? ? ? ? ? , -- Deleg, ? ? ? ? ? ? ? ?, ? ? ? ?) > 2018-12-13 11:53:41 : epoch 0009008d : server1 : > gpfs.ganesha.nfsd-258762[work-250] > export_check_access :EXPORT :M_DBG :EXPORT_DEFAULTS > (options=42102002root_squash ? , ----, 3--, ---, > TCP, ----, Manage_Gids ? , ? ? ? ? , anon_uid= ? ?-2, anon_gid= ? > ?-2, sys) > 2018-12-13 11:53:41 : epoch 0009008d : server1 : > gpfs.ganesha.nfsd-258762[work-250] > export_check_access :EXPORT :M_DBG :default options > (options=03303002root_squash ? , ----, 34-, UDP, > TCP, ----, No Manage_Gids, -- Deleg, anon_uid= ? ?-2, anon_gid= ? > ?-2, none, sys) > 2018-12-13 11:53:41 : epoch 0009008d : server1 : > gpfs.ganesha.nfsd-258762[work-250] > export_check_access :EXPORT :M_DBG :Final options > (options=42102002root_squash ? , ----, 3--, ---, > TCP, ----, Manage_Gids ? , -- Deleg, anon_uid= ? ?-2, anon_gid= ? > ?-2, sys) > 2018-12-13 11:53:41 : epoch 0009008d : server1 : > gpfs.ganesha.nfsd-258762[work-250] nfs_rpc_execute > :DISP :INFO :DISP: INFO: Client ::ffff:1.2.3.4 is not allowed to > access Export_Id 1 /gpfsexport, > vers=3, proc=18 > > The client "client1" is definitely a member of the "netgroup1". > But the NETGROUP_CLIENT lookups for > "netgroup2" and "netgroup3" can only happen if the netgroup > caching code reports that "client1" is > NOT a member of "netgroup1". > > I have also opened a support case at IBM for this. > > @Malahal: Looks like you have written the netgroup caching code, > feel free to ask for further > details if required. > > Kind regards, > > Ulrich Sibiller > > -- > Dipl.-Inf. Ulrich Sibiller ? ? ? ? ? science + computing ag > System Administration ? ? ? ? ? ? ? ? ? ?Hagellocher Weg 73 > ?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 72070 Tuebingen, Germany > https://atos.net/de/deutschland/sc > -- > Science + Computing AG > Vorstandsvorsitzender/Chairman of the board of management: > Dr. Martin Matzke > Vorstand/Board of Management: > Matthias Schempp, Sabine Hohenstein > Vorsitzender des Aufsichtsrats/ > Chairman of the Supervisory Board: > Philippe Miltin > Aufsichtsrat/Supervisory Board: > Martin Wibbe, Ursula Morgenstern > Sitz/Registered Office: Tuebingen > Registergericht/Registration Court: Stuttgart > Registernummer/Commercial Register No.: HRB 382196 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.schlipalius at pawsey.org.au Fri Oct 4 07:37:17 2019 From: chris.schlipalius at pawsey.org.au (Chris Schlipalius) Date: Fri, 04 Oct 2019 14:37:17 +0800 Subject: [gpfsug-discuss] 2019 October 18th Australian Spectrum Scale User Group event - last call for user case speakers Message-ID: Hello all, This is the final announcement for the Spectrum Scale Usergroup Sydney Australia on Friday the 18th October 2019. All current Australian Spectrum Scale User Group event details can be found here: http://bit.ly/2YOFQ3u Last call for user case speakers please ? let me know if you are available to speak at this Usergroup. Feel free to circulate this event link to all who may need it. Please reserve your tickets now as tickets for places will close soon. There are some great speakers and topics, for details please see the agenda on Eventbrite. This is a combined Spectrum Scale, Spectrum Archive, Spectrum Protect and Spectrum LSF event. We are looking forwards to a great Usergroup in Sydney. Thanks again to IBM for helping to arrange the venue and event booking. Best Regards, Chris Schlipalius IBM Champion 2019 Team Lead, Storage Infrastructure, Data & Visualisation, The Pawsey Supercomputing Centre (CSIRO) GPFSUGAUS at gmail.com From mnaineni at in.ibm.com Fri Oct 4 11:55:20 2019 From: mnaineni at in.ibm.com (Malahal R Naineni) Date: Fri, 4 Oct 2019 10:55:20 +0000 Subject: [gpfsug-discuss] Filesystem access issues via CES NFS In-Reply-To: <08c2be8b-ece1-a7a4-606f-e63fe854a8b1@psi.ch> References: <08c2be8b-ece1-a7a4-606f-e63fe854a8b1@psi.ch>, <717f49aade0b439eb1b99fc620a21cac@maxiv.lu.se> <9456645b0a1f4b488b13874ea672b9b8@maxiv.lu.se> Message-ID: An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Fri Oct 4 16:51:34 2019 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Fri, 4 Oct 2019 15:51:34 +0000 Subject: [gpfsug-discuss] Lenovo GSS Planned End-of-Support Message-ID: <8cde86c2-3277-1a3a-7f91-62199158f6c4@rutgers.edu> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi there, Anyone know for sure when Lenovo is planning to release it's last version of the GSS software for its GSS solutions? I figure someone might be sufficiently plugged into the development here. Thanks! - -- ____ || \\UTGERS, |----------------------*O*------------------------ ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark `' -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQST3OUUqPn4dxGCSm6Zv6Bp0RyxvgUCXZdqfgAKCRCZv6Bp0Ryx vuDHAJ9vO2/G6YLVbnoifliLDztMcVhENgCg01jB7VhZA9M85hKUe2FUOrKRios= =4iyR -----END PGP SIGNATURE----- From ncalimet at lenovo.com Fri Oct 4 16:59:03 2019 From: ncalimet at lenovo.com (Nicolas CALIMET) Date: Fri, 4 Oct 2019 15:59:03 +0000 Subject: [gpfsug-discuss] [External] Lenovo GSS Planned End-of-Support In-Reply-To: <8cde86c2-3277-1a3a-7f91-62199158f6c4@rutgers.edu> References: <8cde86c2-3277-1a3a-7f91-62199158f6c4@rutgers.edu> Message-ID: Ryan, If the question really is for how long GSS will be supported, then maintenance releases are on the roadmap till at least 2022 in principle. If otherwise you are referring to the latest GSS code levels, then GSS 3.4b has been released late August. Regards, - Nicolas -- Nicolas Calimet, PhD | HPC System Architect | Lenovo DCG | Meitnerstrasse 9, D-70563 Stuttgart, Germany | +49 71165690146 -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Ryan Novosielski Sent: Friday, October 4, 2019 17:52 To: gpfsug-discuss at spectrumscale.org Subject: [External] [gpfsug-discuss] Lenovo GSS Planned End-of-Support -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi there, Anyone know for sure when Lenovo is planning to release it's last version of the GSS software for its GSS solutions? I figure someone might be sufficiently plugged into the development here. Thanks! - -- ____ || \\UTGERS, |----------------------*O*------------------------ ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark `' -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQST3OUUqPn4dxGCSm6Zv6Bp0RyxvgUCXZdqfgAKCRCZv6Bp0Ryx vuDHAJ9vO2/G6YLVbnoifliLDztMcVhENgCg01jB7VhZA9M85hKUe2FUOrKRios= =4iyR -----END PGP SIGNATURE----- _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From novosirj at rutgers.edu Fri Oct 4 17:15:08 2019 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Fri, 4 Oct 2019 16:15:08 +0000 Subject: [gpfsug-discuss] [External] Lenovo GSS Planned End-of-Support In-Reply-To: References: <8cde86c2-3277-1a3a-7f91-62199158f6c4@rutgers.edu> Message-ID: <5228bcf4-fe1b-cfc7-e1aa-071131496011@rutgers.edu> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Yup, that's the question; thanks for the help. I'd heard a rumor that there was a 2020 date, and wanted to see if I could get any indication in particular as to whether that was true. Sounds like even if it's not 2022, it's probably not 2020. We're clear on the current version -- planning the upgrade at the moment . On 10/4/19 11:59 AM, Nicolas CALIMET wrote: > Ryan, > > If the question really is for how long GSS will be supported, then > maintenance releases are on the roadmap till at least 2022 in > principle. If otherwise you are referring to the latest GSS code > levels, then GSS 3.4b has been released late August. > > Regards, - Nicolas > - -- ____ || \\UTGERS, |----------------------*O*------------------------ ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark `' -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQST3OUUqPn4dxGCSm6Zv6Bp0RyxvgUCXZdwBAAKCRCZv6Bp0Ryx vjAWAJ9OGbVfhM0m+/NXCRzXo8raIj/tNwCeMtg0osqnl3l16J4TC3oZGw9xxk4= =utaK -----END PGP SIGNATURE----- From kkr at lbl.gov Fri Oct 4 21:53:20 2019 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Fri, 4 Oct 2019 13:53:20 -0700 Subject: [gpfsug-discuss] Quota via API anyway to avoid negative values? Message-ID: Hi, There is a flag with mmlsquota to prevent the potential of getting negative values back: -e Specifies that mmlsquota is to collect updated quota usage data from all nodes before displaying results. If -e is not specified, there is the potential to display negative usage values as the quota server may process a combination of up-to-date and back-level information. However, we are using the API to collectively show quotas across GPFS and non-GPFS filesystems via one user-driven command. We are getting negative values using the API. Does anyone know the -e equivalent for the API? https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.2/com.ibm.spectrum.scale.v4r22.doc/bl1adm_gpfs_quotactl.htm Thanks, Kristy -------------- next part -------------- An HTML attachment was scrubbed... URL: From vpuvvada at in.ibm.com Sat Oct 5 05:30:49 2019 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Sat, 5 Oct 2019 10:00:49 +0530 Subject: [gpfsug-discuss] afmRefreshAsync questions In-Reply-To: <7476c598c32440f1bffe7d9e950c0965@maxiv.lu.se> References: , , <7476c598c32440f1bffe7d9e950c0965@maxiv.lu.se> Message-ID: I would recommend opening a case, collect the default traces from both gateway and application (or protocol) nodes to check the RPC overhead. There should not be difference between mmap IO and regular IO for AFM filesets. Also note that refresh intervals are stored as part of inode and for the large number of file access it is possible that inodes are evicted as part of dcache shrinkage and next access to the same files might go to home for the revalidation. afmRefreshAsync option can be set at fleset level also. Looks like it is missing from the documentation, this will be corrected. ~Venkat (vpuvvada at in.ibm.com) From: Andreas Mattsson To: gpfsug main discussion list Date: 10/03/2019 07:25 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] afmRefreshAsync questions Sent by: gpfsug-discuss-bounces at spectrumscale.org After further investigaion, it seems like this XDS software is using memory mapped io when operating on the files. Is it possible that MMAP IO has a higher performance hit by AFM than regular file access? /Andreas ____________________________________________ Andreas Mattsson Systems Engineer MAX IV Laboratory Lund University P.O. Box 118, SE-221 00 Lund, Sweden Visiting address: Fotongatan 2, 224 84 Lund Mobile: +46 706 64 95 44 andreas.mattsson at maxiv.lu.se www.maxiv.se Fr?n: gpfsug-discuss-bounces at spectrumscale.org f?r Andreas Mattsson Skickat: den 1 oktober 2019 08:33:35 Till: gpfsug main discussion list ?mne: Re: [gpfsug-discuss] afmRefreshAsync questions Hi, I've tried increasing all the refresh intervals, but even at 300 seconds, there is very little performance increase. The job runs in several steps, and gets held up at two places, as far as I can see. First at a kind of parallelisation step where about 1000-3000 files are created in the current working folder on a single compute node, and then at a step where lots of small output files are written on each of the compute nodes involved in the job. Comparing with running the same data set on a non-AFM cache fileset in the same storage system, it runs at least a factor 5 slower, even with really high refresh intervals. In the Scale documentation, it states that the afmRefreshAsync is only configurable cluster wide. Is it also configurable on a per-fileset level? https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.3/com.ibm.spectrum.scale.v5r03.doc/bl1adm_configurationparametersAFM.htm The software is XDS, http://xds.mpimf-heidelberg.mpg.de/ Unfortunately it is a closed source software, so it is not possible to adapt the software. Regards, Andreas Mattsson ____________________________________________ Andreas Mattsson Systems Engineer MAX IV Laboratory Lund University P.O. Box 118, SE-221 00 Lund, Sweden Visiting address: Fotongatan 2, 224 84 Lund Mobile: +46 706 64 95 44 andreas.mattsson at maxiv.lu.se www.maxiv.se Fr?n: gpfsug-discuss-bounces at spectrumscale.org f?r Venkateswara R Puvvada Skickat: den 27 september 2019 10:23:13 Till: gpfsug main discussion list ?mne: Re: [gpfsug-discuss] afmRefreshAsync questions Hi, Both storage and client clusters have to be on 5.0.3.x to get the AFM revalidation performance with afmRefreshAsync. What are the refresh intervals ?, you could also try increasing them. Is this config option set at fileset level or cluster level ? ~Venkat (vpuvvada at in.ibm.com) From: Andreas Mattsson To: GPFS User Group Date: 09/26/2019 03:26 PM Subject: [EXTERNAL] [gpfsug-discuss] afmRefreshAsync questions Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, Due to having a data analysis software that isn't running well at all in our AFM caches, it runs 4-6 times slower on an AFM cache than on a non-AFM fileset on the same storage system, I wanted to try out the afmRefreshAsync feature that came with 5.0.3 to see if it is the cache data refresh that is holding things up. Enabling this feature has had zero impact on performance of the software though. The storage cluster is running 5.0.3.x, and afmRefreshAsync has been set there, but at the moment the remote-mounting client cluster is still running 5.0.2.x. Would this feature still have any effect in this setup? Regards, Andreas Mattsson ____________________________________________ Andreas Mattsson Systems Engineer MAX IV Laboratory Lund University P.O. Box 118, SE-221 00 Lund, Sweden Visiting address: Fotongatan 2, 224 84 Lund Mobile: +46 706 64 95 44 andreas.mattsson at maxiv.lu.se www.maxiv.se _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=vrw7qt4uEH-dBuEZSxUvPQM-SJOC0diQptL6vnfxCQA&s=rbRvqgv05seDPo5wFgK2jlRkzvHtU7y7zoNQ3rDV0d0&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 4232 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 4232 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 4232 bytes Desc: not available URL: From st.graf at fz-juelich.de Mon Oct 7 08:22:02 2019 From: st.graf at fz-juelich.de (Stephan Graf) Date: Mon, 7 Oct 2019 09:22:02 +0200 Subject: [gpfsug-discuss] Quota via API anyway to avoid negative values? In-Reply-To: References: Message-ID: <9c1fcd81-d947-e857-ffc8-b68d17142bfb@fz-juelich.de> Hi Kristi, I just want to mention that we have a ticket right now at IBM because of negative quota values. In our case even the '-e' does not work: [root at justnsd01a ~]#? mmlsquota -j hpsadm -e largedata Block Limits | File Limits Filesystem type KB quota limit in_doubt grace | files quota limit in_doubt grace Remarks largedata FILESET -45853247616 536870912000 590558003200 0 none | 6 3000000 3300000 0 none The solution offered by support is to run a 'mmcheckquota'. we are still in discussion. Stephan On 10/4/19 10:53 PM, Kristy Kallback-Rose wrote: > Hi, > > There is a flag with mmlsquota to prevent the potential of getting > negative values back: > > -e > Specifies that mmlsquota is to collect updated quota usage data from all > nodes before displaying results. If -e is not specified, there is the > potential to display negative usage values as the quota server may > process a combination of up-to-date and back-level information. > > > However, we are using the API to collectively show quotas across GPFS > and non-GPFS filesystems via one user-driven command. We are getting > negative values using the API. Does anyone know the -e equivalent for > the API? > > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.2/com.ibm.spectrum.scale.v4r22.doc/bl1adm_gpfs_quotactl.htm > > Thanks, > Kristy > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Stephan Graf Juelich Supercomputing Centre Institute for Advanced Simulation Forschungszentrum Juelich GmbH 52425 Juelich, Germany Phone: +49-2461-61-6578 Fax: +49-2461-61-6656 E-mail: st.graf at fz-juelich.de WWW: http://www.fz-juelich.de/jsc/ -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5322 bytes Desc: S/MIME Cryptographic Signature URL: From jonathan.buzzard at strath.ac.uk Mon Oct 7 15:07:55 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 7 Oct 2019 14:07:55 +0000 Subject: [gpfsug-discuss] Large in doubt on fileset Message-ID: <75222a1da28a8b278c655863d1c1f634830f4435.camel@strath.ac.uk> I have a DSS-G system running 4.2.3-7, and on Friday afternoon became aware that there is a very large (at least I have never seen anything on this scale before) in doubt on a fileset. It has persisted over the weekend and is sitting at 17.5TB, with the fileset having a 150TB quota and only 82TB in use. There is a relatively large 26,500 files in doubt, though there is no quotas on file numbers for the fileset. This has come down from some 47,500 on Friday when the in doubt was a shade over 18TB. The largest in doubt I have seen in the past was in the order of a few hundred GB under very heavy write that went away very quickly after the writing stopped. There is no evidence of heavy writing going on in the file system so I am perplexed as to why the in doubt is remaining so high. Any thoughts as to what might be going on? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From pinto at scinet.utoronto.ca Mon Oct 7 15:24:38 2019 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Mon, 7 Oct 2019 10:24:38 -0400 Subject: [gpfsug-discuss] Large in doubt on fileset In-Reply-To: <75222a1da28a8b278c655863d1c1f634830f4435.camel@strath.ac.uk> References: <75222a1da28a8b278c655863d1c1f634830f4435.camel@strath.ac.uk> Message-ID: <4b450056-f1fe-05ed-3bd7-cae4082b3694@scinet.utoronto.ca> We run DSS as well, also 4.2.x versions, and large indoubt entries are common on our file systems, much larger than what you are seeing, for USR, GRP and FILESET. It didn't use to be so bad on versions 3.4|3.5 in other IBM appliances (GSS, ESS), even DDN's or Cray G200. Under 4.x series the internal automatic mechanism to reconcile accounting seems very laggy by default, and I couldn't find (yet) a config parameter to adjust this. I stopped trying to understand why this happens. Our users are all subject to quotas, and can't wait indefinitely for this reconciliation. I just run mmcheckquota every 6 hours via a crontab. I hope version 5 is better. Will know in a couple of months. Jaime On 2019-10-07 10:07 a.m., Jonathan Buzzard wrote: > > I have a DSS-G system running 4.2.3-7, and on Friday afternoon became > aware that there is a very large (at least I have never seen anything > on this scale before) in doubt on a fileset. It has persisted over the > weekend and is sitting at 17.5TB, with the fileset having a 150TB quota > and only 82TB in use. > > There is a relatively large 26,500 files in doubt, though there is no > quotas on file numbers for the fileset. This has come down from some > 47,500 on Friday when the in doubt was a shade over 18TB. > > The largest in doubt I have seen in the past was in the order of a few > hundred GB under very heavy write that went away very quickly after the > writing stopped. > > There is no evidence of heavy writing going on in the file system so I > am perplexed as to why the in doubt is remaining so high. > > Any thoughts as to what might be going on? > > > JAB. > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 From TOMP at il.ibm.com Mon Oct 7 17:22:13 2019 From: TOMP at il.ibm.com (Tomer Perry) Date: Mon, 7 Oct 2019 19:22:13 +0300 Subject: [gpfsug-discuss] Large in doubt on fileset In-Reply-To: <4b450056-f1fe-05ed-3bd7-cae4082b3694@scinet.utoronto.ca> References: <75222a1da28a8b278c655863d1c1f634830f4435.camel@strath.ac.uk> <4b450056-f1fe-05ed-3bd7-cae4082b3694@scinet.utoronto.ca> Message-ID: Hi, The major change around 4.X in quotas was the introduction of dynamic shares. In the past, every client share request was for constant number of blocks ( 20 blocks by default). For high performing system, it wasn't enough sometime ( imagine 320M for nodes are writing at 20GB/s). So, dynamic shares means that a client node can request 10000 blocks etc. etc. ( it doesn't mean that the server will provide those...). OTOH, node failure will leave more "stale in doubt" capacity since the server don't know how much of the share was actually used. Imagine a client node getting 1024 blocks ( 16G), using 20M and crashing. >From the server perspective, there are 16G "unknown", now multiple that by multiple nodes... The only way to solve it is indeed to execute mmcheckquota - but as you probably know, its not cheap. So, do you experience large number of node expels/crashes etc. that might be related to that ( otherwise, it might be some other bug that needs to be fixed...). Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Jaime Pinto To: gpfsug-discuss at spectrumscale.org Date: 07/10/2019 17:40 Subject: [EXTERNAL] Re: [gpfsug-discuss] Large in doubt on fileset Sent by: gpfsug-discuss-bounces at spectrumscale.org We run DSS as well, also 4.2.x versions, and large indoubt entries are common on our file systems, much larger than what you are seeing, for USR, GRP and FILESET. It didn't use to be so bad on versions 3.4|3.5 in other IBM appliances (GSS, ESS), even DDN's or Cray G200. Under 4.x series the internal automatic mechanism to reconcile accounting seems very laggy by default, and I couldn't find (yet) a config parameter to adjust this. I stopped trying to understand why this happens. Our users are all subject to quotas, and can't wait indefinitely for this reconciliation. I just run mmcheckquota every 6 hours via a crontab. I hope version 5 is better. Will know in a couple of months. Jaime On 2019-10-07 10:07 a.m., Jonathan Buzzard wrote: > > I have a DSS-G system running 4.2.3-7, and on Friday afternoon became > aware that there is a very large (at least I have never seen anything > on this scale before) in doubt on a fileset. It has persisted over the > weekend and is sitting at 17.5TB, with the fileset having a 150TB quota > and only 82TB in use. > > There is a relatively large 26,500 files in doubt, though there is no > quotas on file numbers for the fileset. This has come down from some > 47,500 on Friday when the in doubt was a shade over 18TB. > > The largest in doubt I have seen in the past was in the order of a few > hundred GB under very heavy write that went away very quickly after the > writing stopped. > > There is no evidence of heavy writing going on in the file system so I > am perplexed as to why the in doubt is remaining so high. > > Any thoughts as to what might be going on? > > > JAB. > ************************************ TELL US ABOUT YOUR SUCCESS STORIES https://urldefense.proofpoint.com/v2/url?u=http-3A__www.scinethpc.ca_testimonials&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=mLPyKeOa1gNDrORvEXBgMw&m=2KzJ8YjgXm5NsAjcpquw6pMVJFbLUBZ-KEQb2oHFYqs&s=esG-w1Wj_wInSHpT5fEhqVQMqpR15ZXaGxoQmjOKdDc&e= ************************************ --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=mLPyKeOa1gNDrORvEXBgMw&m=2KzJ8YjgXm5NsAjcpquw6pMVJFbLUBZ-KEQb2oHFYqs&s=dxj6p74pt5iaKKn4KvMmMPyLcUD5C37HbIc2zX-iWgY&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Tue Oct 8 11:45:38 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 8 Oct 2019 10:45:38 +0000 Subject: [gpfsug-discuss] Large in doubt on fileset In-Reply-To: References: <75222a1da28a8b278c655863d1c1f634830f4435.camel@strath.ac.uk> <4b450056-f1fe-05ed-3bd7-cae4082b3694@scinet.utoronto.ca> Message-ID: <841c1fd793b4179ea8e27b88f3ed1c7e0f76cb4e.camel@strath.ac.uk> On Mon, 2019-10-07 at 19:22 +0300, Tomer Perry wrote: [SNIP] > > So, do you experience large number of node expels/crashes etc. that > might be related to that ( otherwise, it might be some other bug that > needs to be fixed...). > Not as far as I can determine. The logs show only 58 expels in the last six months and around 2/3rds of those where on essentially dormant nodes that where being used for development work on fixing issues with the xcat node deployment for the compute nodes (triggering an rinstall on a node that was up with GPFS mounted but actually doing nothing). I have done an mmcheckquota which didn't take long to complete and now I the "in doubt" is a more reasonable sub 10GB. I shall monitor what happens more closely in future. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From scale at us.ibm.com Tue Oct 8 14:15:48 2019 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Tue, 8 Oct 2019 09:15:48 -0400 Subject: [gpfsug-discuss] Quota via API anyway to avoid negative values? In-Reply-To: References: Message-ID: Kristy, there is no equivalent to the -e option in the quota API. If your application receives negative quota values it is suggested that you use the mmlsquota command with the -e option to obtain the most recent quota usage information, or run the mmcheckquota command. Using either the -e option to mmlsquota or the mmcheckquota is an IO intensive operation so it would be wise not to run the command when the system is heavily loaded. Note that using the mmcheckquota command does provide QoS options to mitigate the impact of the operation on the cluster. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Kristy Kallback-Rose To: gpfsug main discussion list Date: 10/04/2019 04:53 PM Subject: [EXTERNAL] [gpfsug-discuss] Quota via API anyway to avoid negative values? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, There is a flag with mmlsquota to prevent the potential of getting negative values back: -e Specifies that mmlsquota is to collect updated quota usage data from all nodes before displaying results. If -e is not specified, there is the potential to display negative usage values as the quota server may process a combination of up-to-date and back-level information. However, we are using the API to collectively show quotas across GPFS and non-GPFS filesystems via one user-driven command. We are getting negative values using the API. Does anyone know the -e equivalent for the API? https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.2/com.ibm.spectrum.scale.v4r22.doc/bl1adm_gpfs_quotactl.htm Thanks, Kristy_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=hdhTNLoVRkMglSs8c9Ho37FKFZUJrCmrXG5pXqjtFbE&s=wfHn6xg9_2qzVFdBAthevvEHreS934rP1w88f3jSFcs&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From TROPPENS at de.ibm.com Wed Oct 9 16:50:31 2019 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Wed, 9 Oct 2019 17:50:31 +0200 Subject: [gpfsug-discuss] =?utf-8?q?Fw=3A___Agenda_and_registration_link_/?= =?utf-8?q?/_Oct_10_-_Spectrum=09Scale_NYC_User_Meeting?= Message-ID: Reminder about the user meeting in NYC tomorrow. https://www.spectrumscaleug.org/event/spectrum-scale-nyc-user-meeting-2019/ -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 ----- Forwarded by Ulf Troppens/Germany/IBM on 09/10/2019 17:46 ----- From: "Ulf Troppens" To: gpfsug main discussion list Date: 20/09/2019 10:12 Subject: [EXTERNAL] [gpfsug-discuss] Agenda and registration link // Oct 10 - Spectrum Scale NYC User Meeting Sent by: gpfsug-discuss-bounces at spectrumscale.org Draft agenda and registration link are now available: https://www.spectrumscaleug.org/event/spectrum-scale-nyc-user-meeting-2019/ -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 ----- Forwarded by Ulf Troppens/Germany/IBM on 20/09/2019 09:37 ----- From: "Ulf Troppens" To: gpfsug main discussion list Date: 11/09/2019 14:27 Subject: [EXTERNAL] [gpfsug-discuss] Save the date: Oct 10 - Spectrum Scale NYC User Meeting Sent by: gpfsug-discuss-bounces at spectrumscale.org Greetings, NYU Langone and IBM will host a Spectrum Scale User Meeting on October 10. Many senior engineers of our development lab in Poughkeepsie will attend and present. Details with agenda, exact location and registration link will follow. Best Ulf -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=kZaabFheMr5-INuBtDMnDjxzZMuvvQ-K0cx1FAfh4lg&m=x_he-vxYPdTCut1I-gX7dq5MQmsSZA_1952yvpisLn0&s=ghgxcu8zRWQLv9DIXJ3-CX14SDFrx3hYKsjt-_IWZIM&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From damir.krstic at gmail.com Thu Oct 10 21:43:45 2019 From: damir.krstic at gmail.com (Damir Krstic) Date: Thu, 10 Oct 2019 15:43:45 -0500 Subject: [gpfsug-discuss] waiters and files causing waiters Message-ID: is it possible via some set of mmdiag --waiters or mmfsadm dump ? to figure out which files or directories access (whether it's read or write) is causing long-er waiters? in all my looking i have not been able to get that information out of various diagnostic commands. thanks, damir -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Thu Oct 10 22:26:35 2019 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Thu, 10 Oct 2019 17:26:35 -0400 Subject: [gpfsug-discuss] waiters and files causing waiters In-Reply-To: References: Message-ID: The short answer is there is no easy way to determine what file/directory a waiter may be related. Generally, it is not necessary to know the file/directory since a properly sized/configured cluster should not have long waiters occurring, unless there is some type of failure in the cluster. If you were to capture sufficient information across the cluster you might be able to work out the file/directory involved in a long waiter but it would take either trace, or combing through lots of internal data structures. It would be helpful to know more details about your cluster to provide suggestions for what may be causing the long waiters. I presume you are seeing them on a regular basis and would like to understand why they are occurring. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Damir Krstic To: gpfsug main discussion list Date: 10/10/2019 04:44 PM Subject: [EXTERNAL] [gpfsug-discuss] waiters and files causing waiters Sent by: gpfsug-discuss-bounces at spectrumscale.org is it possible via some set of mmdiag --waiters or mmfsadm dump ? to figure out which files or directories access (whether it's read or write) is causing long-er waiters? in all my looking i have not been able to get that information out of various diagnostic commands. thanks, damir_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=9T66XmHIdF5y7JaNmf28qRGIn35K4t-9H7vwGkDMjgo&s=ncg0MQla29iX--sQeAmcB2XqE3_7zSFGmhnDgj9s--w&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex at calicolabs.com Thu Oct 10 23:33:06 2019 From: alex at calicolabs.com (Alex Chekholko) Date: Thu, 10 Oct 2019 15:33:06 -0700 Subject: [gpfsug-discuss] waiters and files causing waiters In-Reply-To: References: Message-ID: If the waiters are on a compute node and there is not much user work running there, then the open files listed by lsof will probably be the culprits. On Thu, Oct 10, 2019 at 1:44 PM Damir Krstic wrote: > is it possible via some set of mmdiag --waiters or mmfsadm dump ? to > figure out which files or directories access (whether it's read or write) > is causing long-er waiters? > > in all my looking i have not been able to get that information out of > various diagnostic commands. > > thanks, > damir > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Fri Oct 11 00:05:16 2019 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Thu, 10 Oct 2019 23:05:16 +0000 Subject: [gpfsug-discuss] waiters and files causing waiters In-Reply-To: References: Message-ID: <7E19298C-2C28-48F9-BE89-F91B9EC66866@rutgers.edu> I?ll dig through my notes. I had a similar situation and an engineer taught me how to do it. It?s a bit involved though. Not something you?d bother with for something transient. -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Oct 10, 2019, at 16:44, Damir Krstic wrote: ? is it possible via some set of mmdiag --waiters or mmfsadm dump ? to figure out which files or directories access (whether it's read or write) is causing long-er waiters? in all my looking i have not been able to get that information out of various diagnostic commands. thanks, damir _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Fri Oct 11 17:07:30 2019 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Fri, 11 Oct 2019 16:07:30 +0000 Subject: [gpfsug-discuss] User Group Meeting at SC19 - Registration is Open! Message-ID: <9C59AEAC-C26D-47ED-9321-BCC6A58F2E05@nuance.com> Join us at SC19 for the user group meeting on Sunday November 17th at the Hyatt Regency in Denver! This year there will be a morning session for new users to Spectrum Scale. Afternoon portion will be a collection of updates from IBM and user/sponsor talks. Details Here: https://www.spectrumscaleug.org/event/spectrum-scale-user-group-meeting-sc19/ (watch here for agenda updates) You do need to pre-register here: http://www.ibm.com/events/2019/SC19_BC This year we will have a limited number of box lunches available for users, free of charge. We?ll also have WiFi access for the attendees - Huzzah! Many thanks to our sponsors: IBM, Starfish Software, Mark III Systems, and Lenovo for helping us make this event possible and free of charge to all attendees. Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From bamirzadeh at tower-research.com Fri Oct 11 18:04:08 2019 From: bamirzadeh at tower-research.com (Behrooz Amirzadeh) Date: Fri, 11 Oct 2019 13:04:08 -0400 Subject: [gpfsug-discuss] waiters and files causing waiters In-Reply-To: <7E19298C-2C28-48F9-BE89-F91B9EC66866@rutgers.edu> References: <7E19298C-2C28-48F9-BE89-F91B9EC66866@rutgers.edu> Message-ID: I think it depends on the type of deadlock. For example, if hung nodes are the cause of the deadlock. I don't think there will be any files to go after. I've seen that it is possible in certain cases but no guarantees. When the deadlock is detected you can look at the internaldump that gets created on the deadlock node, for example: ===== dump deadlock ===== Current time 2019-09-24_10:17:30-0400 Waiting 904.5729 sec since 10:02:25, on node aresnode7132, thread 3584968 SyncFSWorkerThread: on ThCond 0x18042226DB8 (LkObjCondvar), reason 'waiting for RO lock' Then you search in the same file for the ThCond further down. You'll most likely see that it is associated with a mutex ===== dump condvar ===== Current time 2019-09-24_10:17:32-0400 . . 'LkObjCondvar' at 0x18042226DB8 (0xFFFFC90042226DB8) (mutex 'InodeCacheObjMutex' at 0x18042226C08 (0xFFFFC90042226C08 PTR_OK)) waitCount 1 condvarEventWordP 0xFFFF880DB4AAF088 Then you'll search for the that mutex in the same file ===== dump selected_files ===== Current time 2019-09-24_10:17:32-0400 Files in stripe group gpfs0: Selected: LkObj::mostWanted: 0x18042226D80 lock_state=0x2000000000000000 xlock_state=0x0 lock_flags=0x11 OpenFile: 429E985A0BFE280A:000000008285ECBD:0000000000000000 @ 0x18042226BD8 cach 1 ref 1 hc 3 tc 6 mtx 0x18042226C08 Inode: valid eff token xw @ 0x18042226D80, ctMode xw seq 175 lock state [ xw ] x [] flags [ dmn wka ] writer 39912 hasWaiters 1 0 Mnode: valid eff token xw @ 0x18042226DD0, ctMode xw seq 175 DMAPI: invalid eff token nl @ 0x18042226D30, ctMode nl seq 174 SMBOpen: valid eff token (A: M D: ) @ 0x18042226C60, ctMode (A: M D: ) Flags 0x30 (pfro+pfxw) seq 175 lock state [ (nil) D: ] x [] flags [ ] SMBOpLk: valid eff token wf @ 0x18042226CD0, ctMode wf Flags 0x30 (pfro+pfxw) seq 175 BR: @ 0x18042226E30, ctMode nl Flags 0x10 (pfro) seq 175 treeP 0x18048C1EFB8 C btFastTrack 0 1 ranges mode RO/XW: BLK [0,INF] mode XW node <1335> Fcntl: @ 0x18042226E58, ctMode nl Flags 0x30 (pfro+pfxw) seq 175 treeP 0x1801EBA7EE8 C btFastTrack 0 1 ranges mode RO/XW: BLK [0,INF] mode XW node <1335> * inode 2189814973* snap 0 USERFILE nlink 1 genNum 0x2710E0CC mode 0200100644: -rw-r--r-- tmmgr node (other) metanode (me) fail+panic count -1 flags 0x0, remoteStart 0 remoteCnt 0 localCnt 0 lastFrom 65535 switchCnt 0 BRL nXLocksOrRelinquishes 6 vfsReference 1 dioCount 0 dioFlushNeeded 1 dioSkipCounter 0 dioReentryThreshold 0.000000 lastAllocLsn 0xB8740C5E metadataFlushCount 2, metadataFlushWaiters 0/0, metadataCommitVersion 1 bufferListCount 1 bufferListChangeCount 1 dirty status: dirty fileDirty 1 fileDirtyOrUncommitted 1 dirtiedSyncNum 81078 inodeValid 1 inodeDirtyCount 5 objectVersion 1 mtimeDirty 1 flushVersion 8983 mnodeChangeCount 1 dirtyDataBufs 1 block size code 5 (32 subblocksPerFileBlock) dataBytesPerFileBlock 4194304 fileSize 10213 synchedFileSize 0 indirectionLevel 1 atime 1569333733.493440000 mtime 1569333742.784833000 ctime 1569333742.784712266 crtime 1569333733.493440000 * owner uid 6572 gid 3047* If you were lucky and all of these were found you can get the inode and the uid/gid of the owner of the file. If you happen to catch it quick enough you'll be able to find the file with lsof. Otherwise later with an ilm policy run if the file has not been deleted by the user. Behrooz On Thu, Oct 10, 2019 at 7:05 PM Ryan Novosielski wrote: > I?ll dig through my notes. I had a similar situation and an engineer > taught me how to do it. It?s a bit involved though. Not something you?d > bother with for something transient. > > -- > ____ > || \\UTGERS, > |---------------------------*O*--------------------------- > ||_// the State | Ryan Novosielski - novosirj at rutgers.edu > || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus > || \\ of NJ | Office of Advanced Research Computing - MSB C630, > Newark > `' > > On Oct 10, 2019, at 16:44, Damir Krstic wrote: > > ? > is it possible via some set of mmdiag --waiters or mmfsadm dump ? to > figure out which files or directories access (whether it's read or write) > is causing long-er waiters? > > in all my looking i have not been able to get that information out of > various diagnostic commands. > > thanks, > damir > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Fri Oct 11 18:43:15 2019 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Fri, 11 Oct 2019 17:43:15 +0000 Subject: [gpfsug-discuss] Quotas and AFM Message-ID: <82FEC032-CC22-4535-A490-2FF35E0D625C@rutgers.edu> Does anyone have any good resources or experience with quotas and AFM caches? Our scenario is that we have an AFM home one one site, an AFM cache on another site, and then a client cluster on that remote site that mounts the cache. The AFM filesets are IW. One of them contains our home directories, which have a quota set on the home side. Quotas were disabled entirely on the cache side (I enabled them recently, but did not set them to anything). What I believe we?re running into is scary long AFM queues that are caused by people writing an amount that is over the home quota to the cache, but the cache is accepting it and then failing to sync back to the home because the user is at their hard limit. I believe we?re also seeing delays on unaffected users who are not over their quota, but that?s harder to tell. We have the AFM gateways poorly/not tuned, so that is likely interacting. Is there any way to make the quotas apparent to the cache cluster too, beyond setting a quota there as well, or do I just fundamentally misunderstand this in some other way? We really just want the quotas on the home cluster to be enforced everywhere, more or less. Thanks! -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' From S.J.Thompson at bham.ac.uk Fri Oct 11 20:56:04 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 11 Oct 2019 19:56:04 +0000 Subject: [gpfsug-discuss] Quotas and AFM Message-ID: Yes. When we ran AFM, we had exactly this issue. What would happen is that a user/fileset quota would be hit and a compute job would continue writing. This would eventually fill the AFM queue. If you were lucky you could stop and restart the queue and it would process other files from other users but inevitably we'd get back to the same state. The solution was to increase the quota at home to clear the queue, kill user workload and then reduce their quota again. At home we had replication of two so it wasn't straight forward to set the same quotas on cache, we could just about fudge it for user home directories but not for most of our project storage as we use dependent fileaet quotas. We also saw issues with data in inode at home as this doesn't work at AFM cache so it goes into a block. I've forgotten the exact issues around that now. So our experience was much like you describe. Simon ________________________________ From: on behalf of Ryan Novosielski Sent: Friday, 11 October 2019, 18:43 To: gpfsug main discussion list Subject: [gpfsug-discuss] Quotas and AFM Does anyone have any good resources or experience with quotas and AFM caches? Our scenario is that we have an AFM home one one site, an AFM cache on another site, and then a client cluster on that remote site that mounts the cache. The AFM filesets are IW. One of them contains our home directories, which have a quota set on the home side. Quotas were disabled entirely on the cache side (I enabled them recently, but did not set them to anything). What I believe we?re running into is scary long AFM queues that are caused by people writing an amount that is over the home quota to the cache, but the cache is accepting it and then failing to sync back to the home because the user is at their hard limit. I believe we?re also seeing delays on unaffected users who are not over their quota, but that?s harder to tell. We have the AFM gateways poorly/not tuned, so that is likely interacting. Is there any way to make the quotas apparent to the cache cluster too, beyond setting a quota there as well, or do I just fundamentally misunderstand this in some other way? We really just want the quotas on the home cluster to be enforced everywhere, more or less. Thanks! -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Fri Oct 11 21:05:15 2019 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Fri, 11 Oct 2019 20:05:15 +0000 Subject: [gpfsug-discuss] Quotas and AFM In-Reply-To: References: Message-ID: Do you know is there anything that prevents me from just setting the quotas the same on the IW cache, if there?s no way to inherit? For the case of the home directories, it?s simple, as they are all 100G with some exceptions, so a default user quota takes care of almost all of it. Luckily, that?s right now where our problem is, but we have the potential with other filesets later. I?m also wondering if you can confirm that I should /not/ need to be looking at people who are writing to the at home fileset, where the quotas are set, as a problem syncing TO the cache, e.g. they don?t add to the queue. I assume GPFS sees the over quota and just denies the write, yes? I originally thought the problem was in that direction and was totally perplexed about how it could be so stupid. ? -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Oct 11, 2019, at 15:56, Simon Thompson wrote: ? Yes. When we ran AFM, we had exactly this issue. What would happen is that a user/fileset quota would be hit and a compute job would continue writing. This would eventually fill the AFM queue. If you were lucky you could stop and restart the queue and it would process other files from other users but inevitably we'd get back to the same state. The solution was to increase the quota at home to clear the queue, kill user workload and then reduce their quota again. At home we had replication of two so it wasn't straight forward to set the same quotas on cache, we could just about fudge it for user home directories but not for most of our project storage as we use dependent fileaet quotas. We also saw issues with data in inode at home as this doesn't work at AFM cache so it goes into a block. I've forgotten the exact issues around that now. So our experience was much like you describe. Simon ________________________________ From: on behalf of Ryan Novosielski Sent: Friday, 11 October 2019, 18:43 To: gpfsug main discussion list Subject: [gpfsug-discuss] Quotas and AFM Does anyone have any good resources or experience with quotas and AFM caches? Our scenario is that we have an AFM home one one site, an AFM cache on another site, and then a client cluster on that remote site that mounts the cache. The AFM filesets are IW. One of them contains our home directories, which have a quota set on the home side. Quotas were disabled entirely on the cache side (I enabled them recently, but did not set them to anything). What I believe we?re running into is scary long AFM queues that are caused by people writing an amount that is over the home quota to the cache, but the cache is accepting it and then failing to sync back to the home because the user is at their hard limit. I believe we?re also seeing delays on unaffected users who are not over their quota, but that?s harder to tell. We have the AFM gateways poorly/not tuned, so that is likely interacting. Is there any way to make the quotas apparent to the cache cluster too, beyond setting a quota there as well, or do I just fundamentally misunderstand this in some other way? We really just want the quotas on the home cluster to be enforced everywhere, more or less. Thanks! -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Oct 11 21:10:20 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 11 Oct 2019 20:10:20 +0000 Subject: [gpfsug-discuss] Quotas and AFM In-Reply-To: References: , Message-ID: Yes just set the quotas the same on both. Or a default quota and have exceptions if that works in your case. But this was where I think the inode in file is an issue if you have a lot of small files as in the inode at home they don't consume quota I think but as they are in a data block at cache they do. So it might now be quite so straightforward. And yes writes at home just get out of space, it's the AFM cache that fails on the write back to home but then its in the queue and can block it. Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Ryan Novosielski Sent: Friday, October 11, 2019 9:05:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Quotas and AFM Do you know is there anything that prevents me from just setting the quotas the same on the IW cache, if there?s no way to inherit? For the case of the home directories, it?s simple, as they are all 100G with some exceptions, so a default user quota takes care of almost all of it. Luckily, that?s right now where our problem is, but we have the potential with other filesets later. I?m also wondering if you can confirm that I should /not/ need to be looking at people who are writing to the at home fileset, where the quotas are set, as a problem syncing TO the cache, e.g. they don?t add to the queue. I assume GPFS sees the over quota and just denies the write, yes? I originally thought the problem was in that direction and was totally perplexed about how it could be so stupid. ? -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Oct 11, 2019, at 15:56, Simon Thompson wrote: ? Yes. When we ran AFM, we had exactly this issue. What would happen is that a user/fileset quota would be hit and a compute job would continue writing. This would eventually fill the AFM queue. If you were lucky you could stop and restart the queue and it would process other files from other users but inevitably we'd get back to the same state. The solution was to increase the quota at home to clear the queue, kill user workload and then reduce their quota again. At home we had replication of two so it wasn't straight forward to set the same quotas on cache, we could just about fudge it for user home directories but not for most of our project storage as we use dependent fileaet quotas. We also saw issues with data in inode at home as this doesn't work at AFM cache so it goes into a block. I've forgotten the exact issues around that now. So our experience was much like you describe. Simon ________________________________ From: on behalf of Ryan Novosielski Sent: Friday, 11 October 2019, 18:43 To: gpfsug main discussion list Subject: [gpfsug-discuss] Quotas and AFM Does anyone have any good resources or experience with quotas and AFM caches? Our scenario is that we have an AFM home one one site, an AFM cache on another site, and then a client cluster on that remote site that mounts the cache. The AFM filesets are IW. One of them contains our home directories, which have a quota set on the home side. Quotas were disabled entirely on the cache side (I enabled them recently, but did not set them to anything). What I believe we?re running into is scary long AFM queues that are caused by people writing an amount that is over the home quota to the cache, but the cache is accepting it and then failing to sync back to the home because the user is at their hard limit. I believe we?re also seeing delays on unaffected users who are not over their quota, but that?s harder to tell. We have the AFM gateways poorly/not tuned, so that is likely interacting. Is there any way to make the quotas apparent to the cache cluster too, beyond setting a quota there as well, or do I just fundamentally misunderstand this in some other way? We really just want the quotas on the home cluster to be enforced everywhere, more or less. Thanks! -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Oct 11 21:21:59 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 11 Oct 2019 20:21:59 +0000 Subject: [gpfsug-discuss] Quotas and AFM In-Reply-To: References: , , Message-ID: Oh and I forgot. This only works if you precache th data from home. Otherwise the disk usage at cache is only what you cached, as you don't know what size it is from home. Unless something has changed recently at any rate. Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson Sent: Friday, October 11, 2019 9:10:20 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Quotas and AFM Yes just set the quotas the same on both. Or a default quota and have exceptions if that works in your case. But this was where I think the inode in file is an issue if you have a lot of small files as in the inode at home they don't consume quota I think but as they are in a data block at cache they do. So it might now be quite so straightforward. And yes writes at home just get out of space, it's the AFM cache that fails on the write back to home but then its in the queue and can block it. Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Ryan Novosielski Sent: Friday, October 11, 2019 9:05:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Quotas and AFM Do you know is there anything that prevents me from just setting the quotas the same on the IW cache, if there?s no way to inherit? For the case of the home directories, it?s simple, as they are all 100G with some exceptions, so a default user quota takes care of almost all of it. Luckily, that?s right now where our problem is, but we have the potential with other filesets later. I?m also wondering if you can confirm that I should /not/ need to be looking at people who are writing to the at home fileset, where the quotas are set, as a problem syncing TO the cache, e.g. they don?t add to the queue. I assume GPFS sees the over quota and just denies the write, yes? I originally thought the problem was in that direction and was totally perplexed about how it could be so stupid. ? -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Oct 11, 2019, at 15:56, Simon Thompson wrote: ? Yes. When we ran AFM, we had exactly this issue. What would happen is that a user/fileset quota would be hit and a compute job would continue writing. This would eventually fill the AFM queue. If you were lucky you could stop and restart the queue and it would process other files from other users but inevitably we'd get back to the same state. The solution was to increase the quota at home to clear the queue, kill user workload and then reduce their quota again. At home we had replication of two so it wasn't straight forward to set the same quotas on cache, we could just about fudge it for user home directories but not for most of our project storage as we use dependent fileaet quotas. We also saw issues with data in inode at home as this doesn't work at AFM cache so it goes into a block. I've forgotten the exact issues around that now. So our experience was much like you describe. Simon ________________________________ From: on behalf of Ryan Novosielski Sent: Friday, 11 October 2019, 18:43 To: gpfsug main discussion list Subject: [gpfsug-discuss] Quotas and AFM Does anyone have any good resources or experience with quotas and AFM caches? Our scenario is that we have an AFM home one one site, an AFM cache on another site, and then a client cluster on that remote site that mounts the cache. The AFM filesets are IW. One of them contains our home directories, which have a quota set on the home side. Quotas were disabled entirely on the cache side (I enabled them recently, but did not set them to anything). What I believe we?re running into is scary long AFM queues that are caused by people writing an amount that is over the home quota to the cache, but the cache is accepting it and then failing to sync back to the home because the user is at their hard limit. I believe we?re also seeing delays on unaffected users who are not over their quota, but that?s harder to tell. We have the AFM gateways poorly/not tuned, so that is likely interacting. Is there any way to make the quotas apparent to the cache cluster too, beyond setting a quota there as well, or do I just fundamentally misunderstand this in some other way? We really just want the quotas on the home cluster to be enforced everywhere, more or less. Thanks! -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From vpuvvada at in.ibm.com Mon Oct 14 06:11:21 2019 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Mon, 14 Oct 2019 10:41:21 +0530 Subject: [gpfsug-discuss] Quotas and AFM In-Reply-To: <82FEC032-CC22-4535-A490-2FF35E0D625C@rutgers.edu> References: <82FEC032-CC22-4535-A490-2FF35E0D625C@rutgers.edu> Message-ID: As of today AFM does not support replication or caching of the filesystem or fileset level metadata like quotas, replication factors etc.. , it only supports replication of user's metadata and data. Users have to make sure that same quotas are set at both cache and home clusters. An error message is logged (mmfs.log) at AFM cache gateway if the home have quotas exceeded, and the queue will be stuck until the quotas are increased at the home cluster. ~Venkat (vpuvvada at in.ibm.com) From: Ryan Novosielski To: gpfsug main discussion list Date: 10/11/2019 11:13 PM Subject: [EXTERNAL] [gpfsug-discuss] Quotas and AFM Sent by: gpfsug-discuss-bounces at spectrumscale.org Does anyone have any good resources or experience with quotas and AFM caches? Our scenario is that we have an AFM home one one site, an AFM cache on another site, and then a client cluster on that remote site that mounts the cache. The AFM filesets are IW. One of them contains our home directories, which have a quota set on the home side. Quotas were disabled entirely on the cache side (I enabled them recently, but did not set them to anything). What I believe we?re running into is scary long AFM queues that are caused by people writing an amount that is over the home quota to the cache, but the cache is accepting it and then failing to sync back to the home because the user is at their hard limit. I believe we?re also seeing delays on unaffected users who are not over their quota, but that?s harder to tell. We have the AFM gateways poorly/not tuned, so that is likely interacting. Is there any way to make the quotas apparent to the cache cluster too, beyond setting a quota there as well, or do I just fundamentally misunderstand this in some other way? We really just want the quotas on the home cluster to be enforced everywhere, more or less. Thanks! -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=v6Rlb90lfAveMK0img3_DIq6tq6dce4WXaxNhN0TDBQ&s=PNlMZJgKMhodVCByv07nOOiyF2Sr498Rd4NmIaOkL9g&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From vpuvvada at in.ibm.com Mon Oct 14 07:29:05 2019 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Mon, 14 Oct 2019 11:59:05 +0530 Subject: [gpfsug-discuss] Quotas and AFM In-Reply-To: References: , , Message-ID: As Simon already mentioned, set the similar quotas at both cache and home clusters to avoid the queue stuck problem due to quotas being exceeds home. >At home we had replication of two so it wasn't straight forward to set the same quotas on cache, we could just about fudge it for user home directories but not for most of our project storage as we use dependent fileaet >quotas. AFM will support dependent filesets from 5.0.4. Dependent filesets can be created at the cache in the independent fileset and set the same quotas from the home >We also saw issues with data in inode at home as this doesn't work at AFM cache so it goes into a block. I've forgotten the exact issues around that now. AFM uses some inode space to store the remote file attributes like file handle, file times etc.. as part of the EAs. If the file does not have hard links, maximum inode space used by the AFM is around 200 bytes. AFM cache can store the file's data in the inode if it have 200 bytes of more free space in the inode, otherwise file's data will be stored in subblock rather than using the full block. For example if the inode size is 4K at both cache and home, if the home file size is 3k and inode is using 300 bytes to store the file metadata, then free space in the inode at the home will be 724 bytes(4096 - (3072 + 300)). When this file is cached by the AFM , AFM adds internal EAs for 200 bytes, then the free space in the inode at the cache will be 524 bytes(4096 - (3072 + 300 + 200)). If the filesize is 3600 bytes at the home, AFM cannot store the data in the inode at the cache. So AFM stores the file data in the block only if it does not have enough space to store the internal EAs. ~Venkat (vpuvvada at in.ibm.com) From: Simon Thompson To: gpfsug main discussion list Date: 10/12/2019 01:52 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] Quotas and AFM Sent by: gpfsug-discuss-bounces at spectrumscale.org Oh and I forgot. This only works if you precache th data from home. Otherwise the disk usage at cache is only what you cached, as you don't know what size it is from home. Unless something has changed recently at any rate. Simon From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson Sent: Friday, October 11, 2019 9:10:20 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Quotas and AFM Yes just set the quotas the same on both. Or a default quota and have exceptions if that works in your case. But this was where I think the inode in file is an issue if you have a lot of small files as in the inode at home they don't consume quota I think but as they are in a data block at cache they do. So it might now be quite so straightforward. And yes writes at home just get out of space, it's the AFM cache that fails on the write back to home but then its in the queue and can block it. Simon From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Ryan Novosielski Sent: Friday, October 11, 2019 9:05:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Quotas and AFM Do you know is there anything that prevents me from just setting the quotas the same on the IW cache, if there?s no way to inherit? For the case of the home directories, it?s simple, as they are all 100G with some exceptions, so a default user quota takes care of almost all of it. Luckily, that?s right now where our problem is, but we have the potential with other filesets later. I?m also wondering if you can confirm that I should /not/ need to be looking at people who are writing to the at home fileset, where the quotas are set, as a problem syncing TO the cache, e.g. they don?t add to the queue. I assume GPFS sees the over quota and just denies the write, yes? I originally thought the problem was in that direction and was totally perplexed about how it could be so stupid. ? -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Oct 11, 2019, at 15:56, Simon Thompson wrote: ? Yes. When we ran AFM, we had exactly this issue. What would happen is that a user/fileset quota would be hit and a compute job would continue writing. This would eventually fill the AFM queue. If you were lucky you could stop and restart the queue and it would process other files from other users but inevitably we'd get back to the same state. The solution was to increase the quota at home to clear the queue, kill user workload and then reduce their quota again. At home we had replication of two so it wasn't straight forward to set the same quotas on cache, we could just about fudge it for user home directories but not for most of our project storage as we use dependent fileaet quotas. We also saw issues with data in inode at home as this doesn't work at AFM cache so it goes into a block. I've forgotten the exact issues around that now. So our experience was much like you describe. Simon From: on behalf of Ryan Novosielski Sent: Friday, 11 October 2019, 18:43 To: gpfsug main discussion list Subject: [gpfsug-discuss] Quotas and AFM Does anyone have any good resources or experience with quotas and AFM caches? Our scenario is that we have an AFM home one one site, an AFM cache on another site, and then a client cluster on that remote site that mounts the cache. The AFM filesets are IW. One of them contains our home directories, which have a quota set on the home side. Quotas were disabled entirely on the cache side (I enabled them recently, but did not set them to anything). What I believe we?re running into is scary long AFM queues that are caused by people writing an amount that is over the home quota to the cache, but the cache is accepting it and then failing to sync back to the home because the user is at their hard limit. I believe we?re also seeing delays on unaffected users who are not over their quota, but that?s harder to tell. We have the AFM gateways poorly/not tuned, so that is likely interacting. Is there any way to make the quotas apparent to the cache cluster too, beyond setting a quota there as well, or do I just fundamentally misunderstand this in some other way? We really just want the quotas on the home cluster to be enforced everywhere, more or less. Thanks! -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=FQMV8_Ivetm1R6_TcCWroPT58pjhPJgL39pgOdQEiqw&s=DfvksQLrKgv0OpK3Dr5pR-FUkhNddIvieh9_8h1jyGQ&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.ward at nhm.ac.uk Tue Oct 15 13:34:33 2019 From: p.ward at nhm.ac.uk (Paul Ward) Date: Tue, 15 Oct 2019 12:34:33 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs Message-ID: We are in the process of changing the way GPFS assigns UID/GIDs from internal tdb to using AD RIDs with an offset that matches our linux systems. We, therefore, need to change the ACLs for all the files in GPFS (up to 80 million). We are running in mixed ACL mode, with some POSIX and some NFSv4 ACLs being applied. (This system was set up 14 years ago and has changed roles over time) We are running on linux, so need to have POSIX permissions enabled. What I want to know for those in a similar environment, what do you have as the POSIX owner and group, when NFSv4 ACLs are in use? root:root or do you have all files owned by a filesystem administrator account and group: : on our samba shares we have : admin users = @ So don't actually need the group defined in POSIX. Kindest regards, Paul Paul Ward TS Infrastructure Architect Natural History Museum T: 02079426450 E: p.ward at nhm.ac.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Oct 15 13:51:55 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 15 Oct 2019 12:51:55 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs Message-ID: <531FC6DE-7928-4A4F-B444-DC9D1D78F705@bham.ac.uk> Hi Paul, We use both Windows and Linux with our FS but only have NFSv4 ACLs enabled (we do also set ?chmodAndSetAcl? on the fileset which makes chmod etc work whilst not breaking the ACL badly). We?ve only found 1 case where POSIX ACLs were needed, and really that was some other IBM software that didn?t understand ACLs (which is now fixed). The groups exist in both AD and our internal LDAP where they have gidNumbers assigned. For our research projects we set the following as the default on the directory: $ mmgetacl some-project #NFSv4 ACL #owner:root #group:gITS_BEAR_2019- some-project special:owner@:rwxc:allow:FileInherit:DirInherit (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED group:gITS_BEAR_2019- some-project:rwxc:allow:FileInherit:DirInherit (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED special:everyone@:----:allow (-)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (-)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED special:owner@:rwxc:allow (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED special:group@:rwx-:allow (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (X)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED Simon From: on behalf of Paul Ward Reply to: "gpfsug-discuss at spectrumscale.org" Date: Tuesday, 15 October 2019 at 13:34 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] default owner and group for POSIX ACLs We are in the process of changing the way GPFS assigns UID/GIDs from internal tdb to using AD RIDs with an offset that matches our linux systems. We, therefore, need to change the ACLs for all the files in GPFS (up to 80 million). We are running in mixed ACL mode, with some POSIX and some NFSv4 ACLs being applied. (This system was set up 14 years ago and has changed roles over time) We are running on linux, so need to have POSIX permissions enabled. What I want to know for those in a similar environment, what do you have as the POSIX owner and group, when NFSv4 ACLs are in use? root:root or do you have all files owned by a filesystem administrator account and group: : on our samba shares we have : admin users = @ So don?t actually need the group defined in POSIX. Kindest regards, Paul Paul Ward TS Infrastructure Architect Natural History Museum T: 02079426450 E: p.ward at nhm.ac.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Tue Oct 15 15:30:28 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 15 Oct 2019 14:30:28 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: Message-ID: On Tue, 2019-10-15 at 12:34 +0000, Paul Ward wrote: > We are in the process of changing the way GPFS assigns UID/GIDs from > internal tdb to using AD RIDs with an offset that matches our linux > systems. We, therefore, need to change the ACLs for all the files in > GPFS (up to 80 million). You do realize that will mean backing everything up again... > We are running in mixed ACL mode, with some POSIX and some NFSv4 ACLs > being applied. (This system was set up 14 years ago and has changed > roles over time) We are running on linux, so need to have POSIX > permissions enabled. We run on Linux and only have NFSv4 ACL's applied. I am not sure why you need POSIX ACL's if you are running Linux. Very very few applications will actually check ACL's or even for that matter permissions. They just do an fopen call or similar and the OS either goes yeah or neah, and the app needs to do something in the case of neah. > > What I want to know for those in a similar environment, what do you > have as the POSIX owner and group, when NFSv4 ACLs are in use? > root:root > > or do you have all files owned by a filesystem administrator account > and group: > : > > on our samba shares we have : > admin users = @ > So don?t actually need the group defined in POSIX. > Samba works much better with NFSv4 ACL's. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From S.J.Thompson at bham.ac.uk Tue Oct 15 16:41:35 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 15 Oct 2019 15:41:35 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: Message-ID: <31615908-BB7F-45C8-A9CC-CEA9D81CDE89@bham.ac.uk> I thought Spectrum Protect didn't actually backup again on a file owner change. Sure mmbackup considers it, but I think Protect just updates the metadata. There are also some other options for dsmc that can stop other similar issues if you change ctime maybe. (Other backup tools are available) Simon ?On 15/10/2019, 15:31, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" wrote: On Tue, 2019-10-15 at 12:34 +0000, Paul Ward wrote: > We are in the process of changing the way GPFS assigns UID/GIDs from > internal tdb to using AD RIDs with an offset that matches our linux > systems. We, therefore, need to change the ACLs for all the files in > GPFS (up to 80 million). You do realize that will mean backing everything up again... > We are running in mixed ACL mode, with some POSIX and some NFSv4 ACLs > being applied. (This system was set up 14 years ago and has changed > roles over time) We are running on linux, so need to have POSIX > permissions enabled. We run on Linux and only have NFSv4 ACL's applied. I am not sure why you need POSIX ACL's if you are running Linux. Very very few applications will actually check ACL's or even for that matter permissions. They just do an fopen call or similar and the OS either goes yeah or neah, and the app needs to do something in the case of neah. > > What I want to know for those in a similar environment, what do you > have as the POSIX owner and group, when NFSv4 ACLs are in use? > root:root > > or do you have all files owned by a filesystem administrator account > and group: > : > > on our samba shares we have : > admin users = @ > So don?t actually need the group defined in POSIX. > Samba works much better with NFSv4 ACL's. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From stockf at us.ibm.com Tue Oct 15 17:09:14 2019 From: stockf at us.ibm.com (Frederick Stock) Date: Tue, 15 Oct 2019 16:09:14 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: <31615908-BB7F-45C8-A9CC-CEA9D81CDE89@bham.ac.uk> References: <31615908-BB7F-45C8-A9CC-CEA9D81CDE89@bham.ac.uk>, Message-ID: An HTML attachment was scrubbed... URL: From p.ward at nhm.ac.uk Tue Oct 15 17:15:50 2019 From: p.ward at nhm.ac.uk (Paul Ward) Date: Tue, 15 Oct 2019 16:15:50 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: Message-ID: An amalgamated answer... > You do realize that will mean backing everything up again... From the tests that I have done, it appears not. A Spectrum protect incremental backup performs an 'update' when the ACL is changed via mmputacl or chown. when I do a backup after an mmputacl or chown ACL change on a migrated file, it isn't recalled, so it cant be backing up the file. If I do the same change from windows over a smb mount, it does cause the file to be recalled and backedup. > ...I am not sure why you need POSIX ACL's if you are running Linux... From what I have recently read... https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adm.doc/bl1adm_admnfsaclg.htm "Linux does not allow a file system to be NFS V4 exported unless it supports POSIX ACLs." As I said this system has had roles added to it. The original purpose was to only support NFS exports, then as a staging area for IT, as end user access wasn't needed, only POSIX permissions were used. No it has end user SMB mounts. >?chmodAndSetAcl? Saw this recently - will look at changing to that! https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adm.doc/bl1adm_authoriziefileprotocolusers.htm "To allow proper use of ACLs, it is recommended to prevent chmod from overwriting the ACLs by setting this parameter to setAclOnly or chmodAndSetAcl." >#owner:root OK so you do have root as the owner. > special:owner@:rwxc:allow:FileInherit:DirInherit And have it propagated to children. > group:gITS_BEAR_2019- some-project:rwxc:allow:FileInherit:DirInherit We by default assign two groups to a folder, a RW and R only. > special:everyone@:----:allow > special:owner@:rwxc:allow > special:group@:rwx-:allow I have been removing these. This seems to work, but was set via windows: POSIX: d--------- 2 root root 512 Apr 11 2019 #NFSv4 ACL #owner:root #group:root #ACL flags: # DACL_PRESENT # DACL_AUTO_INHERITED # SACL_AUTO_INHERITED # NULL_SACL group:dg--ro:r-x-:allow:FileInherit:DirInherit (X)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED group:dg--rwm:rwx-:allow:FileInherit:DirInherit (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED group:dl-:r-x-:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED So is root as the owner the norm? Kindest regards, Paul Paul Ward TS Infrastructure Architect Natural History Museum T: 02079426450 E: p.ward at nhm.ac.uk -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jonathan Buzzard Sent: 15 October 2019 15:30 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] default owner and group for POSIX ACLs On Tue, 2019-10-15 at 12:34 +0000, Paul Ward wrote: > We are in the process of changing the way GPFS assigns UID/GIDs from > internal tdb to using AD RIDs with an offset that matches our linux > systems. We, therefore, need to change the ACLs for all the files in > GPFS (up to 80 million). You do realize that will mean backing everything up again... > We are running in mixed ACL mode, with some POSIX and some NFSv4 ACLs > being applied. (This system was set up 14 years ago and has changed > roles over time) We are running on linux, so need to have POSIX > permissions enabled. We run on Linux and only have NFSv4 ACL's applied. I am not sure why you need POSIX ACL's if you are running Linux. Very very few applications will actually check ACL's or even for that matter permissions. They just do an fopen call or similar and the OS either goes yeah or neah, and the app needs to do something in the case of neah. > > What I want to know for those in a similar environment, what do you > have as the POSIX owner and group, when NFSv4 ACLs are in use? > root:root > > or do you have all files owned by a filesystem administrator account > and group: > : > > on our samba shares we have : > admin users = @ > So don?t actually need the group defined in POSIX. > Samba works much better with NFSv4 ACL's. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7Cp.ward%40nhm.ac.uk%7C54e024b8b52b4a70208e08d7517c47fc%7C73a29c014e78437fa0d4c8553e1960c1%7C1%7C0%7C637067466552637538&sdata=v43g1MEBnRBZP%2B5J7ORvywIq6poqhK24fTsCco0IEDo%3D&reserved=0 From p.ward at nhm.ac.uk Tue Oct 15 17:18:15 2019 From: p.ward at nhm.ac.uk (Paul Ward) Date: Tue, 15 Oct 2019 16:18:15 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: <31615908-BB7F-45C8-A9CC-CEA9D81CDE89@bham.ac.uk>, Message-ID: Hi Fred, From the tests I have done changing the ACL results in just an ?update? to when using Spectrum Protect, even on migrated files. Kindest regards, Paul Paul Ward TS Infrastructure Architect Natural History Museum T: 02079426450 E: p.ward at nhm.ac.uk From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Frederick Stock Sent: 15 October 2019 17:09 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] default owner and group for POSIX ACLs As I understand if you change only the POSIX attributes on a file then you are correct that TSM will only backup the file metadata, actually just the POSIX relevant metadata. However, if you change ACLs or other GPFS specific metadata then TSM will backup the entire file, TSM does not keep all file metadata separate from the actual file data. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com ----- Original message ----- From: Simon Thompson > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: [EXTERNAL] Re: [gpfsug-discuss] default owner and group for POSIX ACLs Date: Tue, Oct 15, 2019 11:41 AM I thought Spectrum Protect didn't actually backup again on a file owner change. Sure mmbackup considers it, but I think Protect just updates the metadata. There are also some other options for dsmc that can stop other similar issues if you change ctime maybe. (Other backup tools are available) Simon ?On 15/10/2019, 15:31, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" > wrote: On Tue, 2019-10-15 at 12:34 +0000, Paul Ward wrote: > We are in the process of changing the way GPFS assigns UID/GIDs from > internal tdb to using AD RIDs with an offset that matches our linux > systems. We, therefore, need to change the ACLs for all the files in > GPFS (up to 80 million). You do realize that will mean backing everything up again.... > We are running in mixed ACL mode, with some POSIX and some NFSv4 ACLs > being applied. (This system was set up 14 years ago and has changed > roles over time) We are running on linux, so need to have POSIX > permissions enabled. We run on Linux and only have NFSv4 ACL's applied. I am not sure why you need POSIX ACL's if you are running Linux. Very very few applications will actually check ACL's or even for that matter permissions. They just do an fopen call or similar and the OS either goes yeah or neah, and the app needs to do something in the case of neah. > > What I want to know for those in a similar environment, what do you > have as the POSIX owner and group, when NFSv4 ACLs are in use? > root:root > > or do you have all files owned by a filesystem administrator account > and group: > : > > on our samba shares we have : > admin users = @ > So don?t actually need the group defined in POSIX. > Samba works much better with NFSv4 ACL's. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org gpfsug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Tue Oct 15 17:49:34 2019 From: stockf at us.ibm.com (Frederick Stock) Date: Tue, 15 Oct 2019 16:49:34 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: , <31615908-BB7F-45C8-A9CC-CEA9D81CDE89@bham.ac.uk>, Message-ID: An HTML attachment was scrubbed... URL: From p.ward at nhm.ac.uk Tue Oct 15 19:27:01 2019 From: p.ward at nhm.ac.uk (Paul Ward) Date: Tue, 15 Oct 2019 18:27:01 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: , <31615908-BB7F-45C8-A9CC-CEA9D81CDE89@bham.ac.uk>, Message-ID: I have tested replacing POSIX with NFSv4, I have altered POSIX and altered NFSv4. The example below is NFSv4 changed to POSIX I have also tested on folders. Action Details Pre Changes File is backed up, migrated and has a nfsv4 ACL > ls -l ---------- 1 root 16777221 102400000 Sep 18 15:07 100mb-9.dat > dsmls 102400000 0 0 m 100mb-9.dat > dsmc q backup ?? -inac 102,400,000 B 09/18/2019 15:53:41 NHM_DATA_MC A /?/100mb-9.dat 102,400,000 B 09/18/2019 15:08:58 NHM_DATA_MC I /?/100mb-9.dat >mmgetacl #NFSv4 ACL #owner:root #group:16777221 group:1399645580:rwx-:allow:Inherited (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED group:16783540:rwx-:allow:Inherited (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED group:16777360:r-x-:allow:Inherited (X)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED group:1399621272:r-x-:allow:Inherited (X)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED Erase the nfsv4 acl chown root:root chmod 770 POSIX permissions changed and NFSv4 ACL gone > ls -l -rwxrwx--- 1 root root 102400000 Sep 18 15:07 100mb-9.dat > dsmls 102400000 0 0 m 100mb-9.dat > dsmc q backup ?? -inac 102,400,000 B 09/18/2019 15:53:41 NHM_DATA_MC A /?/100mb-9.dat 102,400,000 B 09/18/2019 15:08:58 NHM_DATA_MC I /?/100mb-9.dat >mmgetacl #owner:root #group:root user::rwxc group::rwx- other::---- Incremental backup Backup ?updates? the backup, but doesn?t transfer any data. dsmc incr "100mb-9.dat" IBM Tivoli Storage Manager Command Line Backup-Archive Client Interface Client Version 7, Release 1, Level 6.4 Client date/time: 10/15/2019 17:57:59 (c) Copyright by IBM Corporation and other(s) 1990, 2016. All Rights Reserved. Node Name: NHM-XXX-XXX Session established with server TSM-XXXXXX: Windows Server Version 7, Release 1, Level 7.0 Server date/time: 10/15/2019 17:57:58 Last access: 10/15/2019 17:57:52 Accessing as node: XXX-XXX Incremental backup of volume '100mb-9.dat' Updating--> 102,400,000 /?/100mb-9.dat [Sent] Successful incremental backup of '/?/100mb-9.dat' Total number of objects inspected: 1 Total number of objects backed up: 0 Total number of objects updated: 1 Total number of objects rebound: 0 Total number of objects deleted: 0 Total number of objects expired: 0 Total number of objects failed: 0 Total number of objects encrypted: 0 Total number of objects grew: 0 Total number of retries: 0 Total number of bytes inspected: 97.65 MB Total number of bytes transferred: 0 B Data transfer time: 0.00 sec Network data transfer rate: 0.00 KB/sec Aggregate data transfer rate: 0.00 KB/sec Objects compressed by: 0% Total data reduction ratio: 100.00% Elapsed processing time: 00:00:01 Post backup Active Backup timestamp hasn?t changed, and file is still migrated. > ls -l -rwxrwx--- 1 root root 102400000 Sep 18 15:07 100mb-9.dat > dsmls 102400000 0 0 m 100mb-9.dat > dsmc q backup ?? -inac 102,400,000 B 09/18/2019 15:53:41 NHM_DATA_MC A /?/100mbM/100mb-9.dat 102,400,000 B 09/18/2019 15:08:58 NHM_DATA_MC I /?/100mbM/100mb-9.dat >mmgetacl #owner:root #group:root user::rwxc group::rwx- other::---- Restore dsmc restore "100mb-9.dat" "100mb-9.dat.restore" IBM Tivoli Storage Manager Command Line Backup-Archive Client Interface Client Version 7, Release 1, Level 6.4 Client date/time: 10/15/2019 18:02:09 (c) Copyright by IBM Corporation and other(s) 1990, 2016. All Rights Reserved. Node Name: NHM-XXX-XXX Session established with server TSM-XXXXXX: Windows Server Version 7, Release 1, Level 7.0 Server date/time: 10/15/2019 18:02:08 Last access: 10/15/2019 18:02:07 Accessing as node: HSM-NHM Restore function invoked. Restoring 102,400,000 /?/100mb-9.dat --> /?/100mb-9.dat.restore [Done] Restore processing finished. Total number of objects restored: 1 Total number of objects failed: 0 Total number of bytes transferred: 97.66 MB Data transfer time: 1.20 sec Network data transfer rate: 83,317.88 KB/sec Aggregate data transfer rate: 689.11 KB/sec Elapsed processing time: 00:02:25 Restored file Restored file has the same permissions as the last backup > ls -l -rwxrwx--- 1 root root 102400000 Sep 18 15:07 100mb-9.dat.restore > dsmls 102400000 102400000 160 r 100mb-9.dat.restore > dsmc q backup ?? -inac ANS1092W No files matching search criteria were found >mmgetacl #owner:root #group:root user::rwxc group::rwx- other::---- I have just noticed: File backedup with POSIX ? restored file permissions POSIX File backedup with POSIX, changed to NFSv4 permissions, incremental backup ? restore file permissions POSIX File backedup with NFSv4, Changed to POSIX permissions, incremental backup ? restore file permissions POSIX File backedup with NFSv4, restore file permissions NFSv4 (there may be other variables involved) Kindest regards, Paul Paul Ward TS Infrastructure Architect Natural History Museum T: 02079426450 E: p.ward at nhm.ac.uk From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Frederick Stock Sent: 15 October 2019 17:50 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] default owner and group for POSIX ACLs Thanks Paul. Could you please clarify which ACL you changed, the GPFS NFSv4 ACL or the POSIX ACL? Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com ----- Original message ----- From: Paul Ward > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: [EXTERNAL] Re: [gpfsug-discuss] default owner and group for POSIX ACLs Date: Tue, Oct 15, 2019 12:18 PM Hi Fred, From the tests I have done changing the ACL results in just an ?update? to when using Spectrum Protect, even on migrated files. Kindest regards, Paul Paul Ward TS Infrastructure Architect Natural History Museum T: 02079426450 E: p.ward at nhm.ac.uk From: gpfsug-discuss-bounces at spectrumscale.org > On Behalf Of Frederick Stock Sent: 15 October 2019 17:09 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] default owner and group for POSIX ACLs As I understand if you change only the POSIX attributes on a file then you are correct that TSM will only backup the file metadata, actually just the POSIX relevant metadata. However, if you change ACLs or other GPFS specific metadata then TSM will backup the entire file, TSM does not keep all file metadata separate from the actual file data. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com ----- Original message ----- From: Simon Thompson > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: [EXTERNAL] Re: [gpfsug-discuss] default owner and group for POSIX ACLs Date: Tue, Oct 15, 2019 11:41 AM I thought Spectrum Protect didn't actually backup again on a file owner change. Sure mmbackup considers it, but I think Protect just updates the metadata. There are also some other options for dsmc that can stop other similar issues if you change ctime maybe. (Other backup tools are available) Simon ?On 15/10/2019, 15:31, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" > wrote: On Tue, 2019-10-15 at 12:34 +0000, Paul Ward wrote: > We are in the process of changing the way GPFS assigns UID/GIDs from > internal tdb to using AD RIDs with an offset that matches our linux > systems. We, therefore, need to change the ACLs for all the files in > GPFS (up to 80 million). You do realize that will mean backing everything up again.... > We are running in mixed ACL mode, with some POSIX and some NFSv4 ACLs > being applied. (This system was set up 14 years ago and has changed > roles over time) We are running on linux, so need to have POSIX > permissions enabled. We run on Linux and only have NFSv4 ACL's applied. I am not sure why you need POSIX ACL's if you are running Linux. Very very few applications will actually check ACL's or even for that matter permissions. They just do an fopen call or similar and the OS either goes yeah or neah, and the app needs to do something in the case of neah. > > What I want to know for those in a similar environment, what do you > have as the POSIX owner and group, when NFSv4 ACLs are in use? > root:root > > or do you have all files owned by a filesystem administrator account > and group: > : > > on our samba shares we have : > admin users = @ > So don?t actually need the group defined in POSIX. > Samba works much better with NFSv4 ACL's. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Oct 15 19:46:06 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 15 Oct 2019 18:46:06 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: , Message-ID: Only the top level of the project is root:root, not all files. The owner inherit is like CREATOROWNER in Windows, so the parent owner isn't inherited, but the permission inherits to newly created files. It was a while ago we worked out our permission defaults but without it we could have users create a file/directory but not be able to edit/change it as whilst the group had permission, the owner didn't. I should note we are all at 5.x code and not 4.2. Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Paul Ward Sent: Tuesday, October 15, 2019 5:15:50 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] default owner and group for POSIX ACLs An amalgamated answer... > You do realize that will mean backing everything up again... >From the tests that I have done, it appears not. A Spectrum protect incremental backup performs an 'update' when the ACL is changed via mmputacl or chown. when I do a backup after an mmputacl or chown ACL change on a migrated file, it isn't recalled, so it cant be backing up the file. If I do the same change from windows over a smb mount, it does cause the file to be recalled and backedup. > ...I am not sure why you need POSIX ACL's if you are running Linux... >From what I have recently read... https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adm.doc/bl1adm_admnfsaclg.htm "Linux does not allow a file system to be NFS V4 exported unless it supports POSIX ACLs." As I said this system has had roles added to it. The original purpose was to only support NFS exports, then as a staging area for IT, as end user access wasn't needed, only POSIX permissions were used. No it has end user SMB mounts. >?chmodAndSetAcl? Saw this recently - will look at changing to that! https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adm.doc/bl1adm_authoriziefileprotocolusers.htm "To allow proper use of ACLs, it is recommended to prevent chmod from overwriting the ACLs by setting this parameter to setAclOnly or chmodAndSetAcl." >#owner:root OK so you do have root as the owner. > special:owner@:rwxc:allow:FileInherit:DirInherit And have it propagated to children. > group:gITS_BEAR_2019- some-project:rwxc:allow:FileInherit:DirInherit We by default assign two groups to a folder, a RW and R only. > special:everyone@:----:allow > special:owner@:rwxc:allow > special:group@:rwx-:allow I have been removing these. This seems to work, but was set via windows: POSIX: d--------- 2 root root 512 Apr 11 2019 #NFSv4 ACL #owner:root #group:root #ACL flags: # DACL_PRESENT # DACL_AUTO_INHERITED # SACL_AUTO_INHERITED # NULL_SACL group:dg--ro:r-x-:allow:FileInherit:DirInherit (X)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED group:dg--rwm:rwx-:allow:FileInherit:DirInherit (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED group:dl-:r-x-:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED So is root as the owner the norm? Kindest regards, Paul Paul Ward TS Infrastructure Architect Natural History Museum T: 02079426450 E: p.ward at nhm.ac.uk -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jonathan Buzzard Sent: 15 October 2019 15:30 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] default owner and group for POSIX ACLs On Tue, 2019-10-15 at 12:34 +0000, Paul Ward wrote: > We are in the process of changing the way GPFS assigns UID/GIDs from > internal tdb to using AD RIDs with an offset that matches our linux > systems. We, therefore, need to change the ACLs for all the files in > GPFS (up to 80 million). You do realize that will mean backing everything up again... > We are running in mixed ACL mode, with some POSIX and some NFSv4 ACLs > being applied. (This system was set up 14 years ago and has changed > roles over time) We are running on linux, so need to have POSIX > permissions enabled. We run on Linux and only have NFSv4 ACL's applied. I am not sure why you need POSIX ACL's if you are running Linux. Very very few applications will actually check ACL's or even for that matter permissions. They just do an fopen call or similar and the OS either goes yeah or neah, and the app needs to do something in the case of neah. > > What I want to know for those in a similar environment, what do you > have as the POSIX owner and group, when NFSv4 ACLs are in use? > root:root > > or do you have all files owned by a filesystem administrator account > and group: > : > > on our samba shares we have : > admin users = @ > So don?t actually need the group defined in POSIX. > Samba works much better with NFSv4 ACL's. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7Cp.ward%40nhm.ac.uk%7C54e024b8b52b4a70208e08d7517c47fc%7C73a29c014e78437fa0d4c8553e1960c1%7C1%7C0%7C637067466552637538&sdata=v43g1MEBnRBZP%2B5J7ORvywIq6poqhK24fTsCco0IEDo%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Oct 15 19:50:54 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 15 Oct 2019 18:50:54 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: , <31615908-BB7F-45C8-A9CC-CEA9D81CDE89@bham.ac.uk>, , Message-ID: Fred, I thought like you that an ACL change caused a backup with mmbackup. Maybe only if you change the NFSv4 ACL. I'm sure it's documented somewhere and there is a flag to Protect to stop this from happening. Maybe a POSIX permission (setfacl style) doesn't trigger a backup. This would tie in with Paul's suggestion that changing via SMB caused the backup to occur. Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of stockf at us.ibm.com Sent: Tuesday, October 15, 2019 5:49:34 PM To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] default owner and group for POSIX ACLs Thanks Paul. Could you please clarify which ACL you changed, the GPFS NFSv4 ACL or the POSIX ACL? Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com ----- Original message ----- From: Paul Ward Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [EXTERNAL] Re: [gpfsug-discuss] default owner and group for POSIX ACLs Date: Tue, Oct 15, 2019 12:18 PM Hi Fred, From the tests I have done changing the ACL results in just an ?update? to when using Spectrum Protect, even on migrated files. Kindest regards, Paul Paul Ward TS Infrastructure Architect Natural History Museum T: 02079426450 E: p.ward at nhm.ac.uk From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Frederick Stock Sent: 15 October 2019 17:09 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] default owner and group for POSIX ACLs As I understand if you change only the POSIX attributes on a file then you are correct that TSM will only backup the file metadata, actually just the POSIX relevant metadata. However, if you change ACLs or other GPFS specific metadata then TSM will backup the entire file, TSM does not keep all file metadata separate from the actual file data. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com ----- Original message ----- From: Simon Thompson > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: [EXTERNAL] Re: [gpfsug-discuss] default owner and group for POSIX ACLs Date: Tue, Oct 15, 2019 11:41 AM I thought Spectrum Protect didn't actually backup again on a file owner change. Sure mmbackup considers it, but I think Protect just updates the metadata. There are also some other options for dsmc that can stop other similar issues if you change ctime maybe. (Other backup tools are available) Simon ?On 15/10/2019, 15:31, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" > wrote: On Tue, 2019-10-15 at 12:34 +0000, Paul Ward wrote: > We are in the process of changing the way GPFS assigns UID/GIDs from > internal tdb to using AD RIDs with an offset that matches our linux > systems. We, therefore, need to change the ACLs for all the files in > GPFS (up to 80 million). You do realize that will mean backing everything up again.... > We are running in mixed ACL mode, with some POSIX and some NFSv4 ACLs > being applied. (This system was set up 14 years ago and has changed > roles over time) We are running on linux, so need to have POSIX > permissions enabled. We run on Linux and only have NFSv4 ACL's applied. I am not sure why you need POSIX ACL's if you are running Linux. Very very few applications will actually check ACL's or even for that matter permissions. They just do an fopen call or similar and the OS either goes yeah or neah, and the app needs to do something in the case of neah. > > What I want to know for those in a similar environment, what do you > have as the POSIX owner and group, when NFSv4 ACLs are in use? > root:root > > or do you have all files owned by a filesystem administrator account > and group: > : > > on our samba shares we have : > admin users = @ > So don?t actually need the group defined in POSIX. > Samba works much better with NFSv4 ACL's. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Tue Oct 15 21:34:34 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 15 Oct 2019 20:34:34 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: Message-ID: On 15/10/2019 17:15, Paul Ward wrote: [SNIP] >> ...I am not sure why you need POSIX ACL's if you are running Linux... > From what I have recently read... > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adm.doc/bl1adm_admnfsaclg.htm > "Linux does not allow a file system to be NFS V4 exported unless it supports POSIX ACLs." > Only if you are using the inbuilt kernel NFS server, which IMHO is awful from a management perspective. That is you have zero visibility into what the hell it is doing when it all goes pear shaped unless you break out dtrace. I am not sure that using dtrace on a production service to find out what is going on is "best practice". It also in my experience stops you cleanly shutting down most of the time. The sooner it gets removed from the kernel the better IMHO. If you are using protocol nodes which is the only supported option as far as I am aware then that does not apply. I would imagined if you are rolling your own Ganesha NFS server it won't matter either. Checking the code of the FSAL in Ganesha shows functions for converting between GPFS ACL's and the ACL format as used by Ganesha. My understanding was one of the drivers for using Ganesha as an NFS server with GPFS was you can write a FSAL to do just that, in the same way as on Samba you load the vfs_gpfs module, unless you are into self flagellation I guess. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From YARD at il.ibm.com Wed Oct 16 05:41:39 2019 From: YARD at il.ibm.com (Yaron Daniel) Date: Wed, 16 Oct 2019 07:41:39 +0300 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: Message-ID: Hi In case you want to review with ls -l the POSIX permissions, please put the relevant permissions on the SMB share, and add CREATOROWNER & CREATETORGROUP. Than ls -l will show you the owner + group + everyone permissions. Regards Yaron Daniel 94 Em Ha'Moshavot Rd Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com Webex: https://ibm.webex.com/meet/yard IBM Israel From: Jonathan Buzzard To: "gpfsug-discuss at spectrumscale.org" Date: 15/10/2019 23:34 Subject: [EXTERNAL] Re: [gpfsug-discuss] default owner and group for POSIX ACLs Sent by: gpfsug-discuss-bounces at spectrumscale.org On 15/10/2019 17:15, Paul Ward wrote: [SNIP] >> ...I am not sure why you need POSIX ACL's if you are running Linux... > From what I have recently read... > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adm.doc/bl1adm_admnfsaclg.htm > "Linux does not allow a file system to be NFS V4 exported unless it supports POSIX ACLs." > Only if you are using the inbuilt kernel NFS server, which IMHO is awful from a management perspective. That is you have zero visibility into what the hell it is doing when it all goes pear shaped unless you break out dtrace. I am not sure that using dtrace on a production service to find out what is going on is "best practice". It also in my experience stops you cleanly shutting down most of the time. The sooner it gets removed from the kernel the better IMHO. If you are using protocol nodes which is the only supported option as far as I am aware then that does not apply. I would imagined if you are rolling your own Ganesha NFS server it won't matter either. Checking the code of the FSAL in Ganesha shows functions for converting between GPFS ACL's and the ACL format as used by Ganesha. My understanding was one of the drivers for using Ganesha as an NFS server with GPFS was you can write a FSAL to do just that, in the same way as on Samba you load the vfs_gpfs module, unless you are into self flagellation I guess. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=Bn1XE9uK2a9CZQ8qKnJE3Q&m=b8w1GtIuT4M2ayhd-sZvIeIGVRrqM7QoXlh1KVj4Zq4&s=huFx7k3Vx10aZ-7AVq1HSVo825JPWVdFaEu3G3Dh-78&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1114 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 3847 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 4266 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 3747 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 3793 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 4301 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 3739 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 3855 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 4338 bytes Desc: not available URL: From mnaineni at in.ibm.com Wed Oct 16 09:21:46 2019 From: mnaineni at in.ibm.com (Malahal R Naineni) Date: Wed, 16 Oct 2019 08:21:46 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Wed Oct 16 09:25:22 2019 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Wed, 16 Oct 2019 08:25:22 +0000 Subject: [gpfsug-discuss] Spectrum Scale Erasure Code Edition (ECE) RedPaper Draft is public now Message-ID: An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Wed Oct 16 10:35:44 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 16 Oct 2019 09:35:44 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: , Message-ID: On Wed, 2019-10-16 at 08:21 +0000, Malahal R Naineni wrote: > >> Ganesha shows functions for converting between GPFS ACL's and the > ACL format as used by Ganesha. > > Ganesha only supports NFSv4 ACLs, so the conversion is a quick one. > kernel NFS server converts NFSv4 ACLs to POSIX ACLs (the mapping > isn't perfect) as many of the Linux file systems only support POSIX > ACLs (at least this was the behavior). > Yes but the point is you don't need POSIX ACL's on your file system if you are doing NFS exports if you use Ganesha as your NFS server and only do NFSv4 exports. It is then down to the client to deal with the ACL's which the Linux client does. In fact it has for as long as I can remember. There are even tools to manipulate the NFSv4 ACL's (see nfs4- acl-tools on RHEL and derivatives). What's missing is "rich ACL" support in the Linux kernel. www.bestbits.at/richacl/ which seems to be down at the moment. Though there has been activity on the user space utilities. https://github.com/andreas-gruenbacher/richacl/ Is it possible to get IBM to devote some resources to moving this along. It would make using GPFS on Linux with ACL's a more pleasant experience. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From p.ward at nhm.ac.uk Wed Oct 16 11:59:03 2019 From: p.ward at nhm.ac.uk (Paul Ward) Date: Wed, 16 Oct 2019 10:59:03 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: , Message-ID: We are running GPFS 4.2.3 with Arcpix build 3.5.10 or 3.5.12. We don't have Ganesha in the build. I'm not sure about the NFS service. Thanks for the responses, its interesting how the discussion has branched into Ganesha and what ACL changes are picked up by Spectrum Protect and mmbackup (my next major change). Any more responses on what is the best practice for the default POSIX owner and group of files and folders, when NFSv4 ACLs are used for SMB shares? Kindest regards, Paul Paul Ward TS Infrastructure Architect Natural History Museum T: 02079426450 E: p.ward at nhm.ac.uk -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jonathan Buzzard Sent: 16 October 2019 10:36 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] default owner and group for POSIX ACLs On Wed, 2019-10-16 at 08:21 +0000, Malahal R Naineni wrote: >> Ganesha shows functions for converting between GPFS ACL's and the ACL format as used by Ganesha. Ganesha only supports NFSv4 ACLs, so the conversion is a quick one. kernel NFS server converts NFSv4 ACLs to POSIX ACLs (the mapping isn't perfect) as many of the Linux file systems only support POSIX ACLs (at least this was the behavior). Yes but the point is you don't need POSIX ACL's on your file system if you are doing NFS exports if you use Ganesha as your NFS server and only do NFSv4 exports. It is then down to the client to deal with the ACL's which the Linux client does. In fact it has for as long as I can remember. There are even tools to manipulate the NFSv4 ACL's (see nfs4- acl-tools on RHEL and derivatives). What's missing is "rich ACL" support in the Linux kernel. https://l.antigena.com/l/wElAOKB71BMteh5p3MJsrMJ1piEPqSzVv7jGE7WAADAaMiBDMV~~SJdC~qYZEePn7-JksRn9_H6cg21GWyrYE77TnWcAWsMEnF3Nwuug0tRR7ud7GDl9vPM3iafYImA3LyGuQInuXsXilJ6R9e2qmotMPRr~Lsq9CHJ2fsu1dBR1EL622lakpWuKLhjucFNsxUODYLWWFMzVbWj_AigKVAIMEX8Xqs0hGKXpOmjJOTejZDjM8bOCA1-jl06wU3DoT-ad3latFOtGR-oTHHwhAmu792L7Grmas12aetAuhTHnCQ6BBtRLGR_-iVJFYKfdyJNMVsDeKcBEBKKFSZdF~7ozqBouoIAZPE6cOA8KQIeh6mt1~_n which seems to be down at the moment. Though there has been activity on the user space utilities. https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fandreas-gruenbacher%2Frichacl%2F&data=02%7C01%7Cp.ward%40nhm.ac.uk%7C2c1e0145dadd4d35842508d7521c4b9c%7C73a29c014e78437fa0d4c8553e1960c1%7C1%7C0%7C637068153793755413&sdata=aUmCoKIC1N5TU95ILatCp2IlmdJ1gKKL8y%2F1V3kWb3M%3D&reserved=0 Is it possible to get IBM to devote some resources to moving this along. It would make using GPFS on Linux with ACL's a more pleasant experience. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7Cp.ward%40nhm.ac.uk%7C2c1e0145dadd4d35842508d7521c4b9c%7C73a29c014e78437fa0d4c8553e1960c1%7C1%7C0%7C637068153793755413&sdata=ZXLszye50npdSFIu1FuLK3eDbUd%2BV5h29xP1N3XD0jQ%3D&reserved=0 From stockf at us.ibm.com Wed Oct 16 12:14:46 2019 From: stockf at us.ibm.com (Frederick Stock) Date: Wed, 16 Oct 2019 11:14:46 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: , , Message-ID: An HTML attachment was scrubbed... URL: From TROPPENS at de.ibm.com Wed Oct 16 13:51:25 2019 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Wed, 16 Oct 2019 14:51:25 +0200 Subject: [gpfsug-discuss] Nov 5 - Spectrum Scale China User Meeting Message-ID: IBM will host a Spectrum Scale User Meeting on November 5 in Shanghai. Senior engineers of our development lab in Beijing will attend and present. Please register here: https://www.spectrumscaleug.org/event/spectrum-scale-china-user-meeting-2019/ -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at esquad.de Wed Oct 16 17:00:00 2019 From: lists at esquad.de (Dieter Mosbach) Date: Wed, 16 Oct 2019 18:00:00 +0200 Subject: [gpfsug-discuss] SMB support on ppc64LE / SLES for SpectrumScale - please vote for RFE Message-ID: <89482a10-bb53-4b49-d37f-7ef2efb28b30@esquad.de> We want to use smb-protocol-nodes for a HANA-SpectrumScale cluster, unfortunately these are only available for RHEL and not for SLES. SLES has a market share of 99% in the HANA environment. I have therefore created a Request for Enhancement (RFE). https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=137250 If you need it, too, please vote for it! Thank you very much! Kind regards Dieter -- Unix and Storage System Engineer HORNBACH-Baumarkt AG Bornheim, Germany From jonathan.buzzard at strath.ac.uk Wed Oct 16 22:32:50 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 16 Oct 2019 21:32:50 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: <31615908-BB7F-45C8-A9CC-CEA9D81CDE89@bham.ac.uk> References: <31615908-BB7F-45C8-A9CC-CEA9D81CDE89@bham.ac.uk> Message-ID: On 15/10/2019 16:41, Simon Thompson wrote: > I thought Spectrum Protect didn't actually backup again on a file > owner change. Sure mmbackup considers it, but I think Protect just > updates the metadata. There are also some other options for dsmc that > can stop other similar issues if you change ctime maybe. > > (Other backup tools are available) > It certainly used too. I spent six months carefully chown'ing files one user at a time so as not to overwhelm the backup, because the first group I did meant no backup for about a week... I have not kept a close eye on it and have just worked on the assumption for the last decade of "don't do that". If it is no longer the case I apologize for spreading incorrect information. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From skylar2 at uw.edu Wed Oct 16 22:46:48 2019 From: skylar2 at uw.edu (Skylar Thompson) Date: Wed, 16 Oct 2019 21:46:48 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: <31615908-BB7F-45C8-A9CC-CEA9D81CDE89@bham.ac.uk> Message-ID: <20191016214648.pnmjmc65e6d4amqi@utumno.gs.washington.edu> On Wed, Oct 16, 2019 at 09:32:50PM +0000, Jonathan Buzzard wrote: > On 15/10/2019 16:41, Simon Thompson wrote: > > I thought Spectrum Protect didn't actually backup again on a file > > owner change. Sure mmbackup considers it, but I think Protect just > > updates the metadata. There are also some other options for dsmc that > > can stop other similar issues if you change ctime maybe. > > > > (Other backup tools are available) > > > > It certainly used too. I spent six months carefully chown'ing files one > user at a time so as not to overwhelm the backup, because the first > group I did meant no backup for about a week... > > I have not kept a close eye on it and have just worked on the assumption > for the last decade of "don't do that". If it is no longer the case I > apologize for spreading incorrect information. TSM can store some amount of metadata in its database without spilling over to a storage pool, so whether a metadata update is cheap or expensive depends not just on ACLs/extended attributes but also the directory entry name length. It can definitely make for some seemingly non-deterministic backup behavior. -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From jonathan.buzzard at strath.ac.uk Thu Oct 17 11:26:45 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Thu, 17 Oct 2019 10:26:45 +0000 Subject: [gpfsug-discuss] mmbackup questions Message-ID: <960c7d8ff66dacb199b36590f96e5faa815e5b06.camel@strath.ac.uk> I have been looking to give mmbackup another go (a very long history with it being a pile of steaming dinosaur droppings last time I tried, but that was seven years ago). Anyway having done a backup last night I am curious about something that does not appear to be explained in the documentation. Basically the output has a line like the following Total number of objects inspected: 474630 What is this number? Is it the number of files that have changed since the last backup or something else as it is not the number of files on the file system by any stretch of the imagination. One would hope that it inspected everything on the file system... Also it appears that the shadow database is held on the GPFS file system that is being backed up. Is there any way to change the location of that? I am only using one node for backup (because I am cheap and don't like paying for more PVU's than I need to) and would like to hold it on the node doing the backup where I can put it on SSD. Which does to things firstly hopefully goes a lot faster, and secondly reduces the impact on the file system of the backup. Anyway a significant speed up (assuming it worked) was achieved but I note even the ancient Xeon E3113 (dual core 3GHz) was never taxed (load average never went above one) and we didn't touch the swap despite only have 24GB of RAM. Though the 10GbE networking did get busy during the transfer of data to the TSM server bit of the backup but during the "assembly stage" it was all a bit quiet, and the DSS-G server nodes where not busy either. What options are there for tuning things because I feel it should be able to go a lot faster. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From stockf at us.ibm.com Thu Oct 17 13:35:18 2019 From: stockf at us.ibm.com (Frederick Stock) Date: Thu, 17 Oct 2019 12:35:18 +0000 Subject: [gpfsug-discuss] mmbackup questions In-Reply-To: <960c7d8ff66dacb199b36590f96e5faa815e5b06.camel@strath.ac.uk> References: <960c7d8ff66dacb199b36590f96e5faa815e5b06.camel@strath.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Oct 17 15:17:17 2019 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 17 Oct 2019 10:17:17 -0400 Subject: [gpfsug-discuss] mmbackup questions In-Reply-To: References: <960c7d8ff66dacb199b36590f96e5faa815e5b06.camel@strath.ac.uk> Message-ID: Along with what Fred wrote, you can look at the mmbackup doc and also peek into the script and find some options to look at the mmapplypolicy RULEs used, and also capture the mmapplypolicy output which will better show you which files and directories are being examined and so forth. --marc From: "Frederick Stock" To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Date: 10/17/2019 08:43 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] mmbackup questions Sent by: gpfsug-discuss-bounces at spectrumscale.org Jonathan the "objects inspected" refers to the number of file system objects that matched the policy rules used for the backup. These rules are influenced by TSM server and client settings, e.g. the dsm.sys file. So not all objects in the file system are actually inspected. As for tuning I think the mmbackup man page is the place to start, and I think it is thorough in its description of the tuning options. You may also want to look at the mmapplypolicy man page since mmbackup invokes it to scan the file system for files that need to be backed up. To my knowledge there are no options to place the shadow database file in another location than the GPFS file system. If the file system has fast storage I see no reason why you could not use a placement policy rule to place the shadow database on that fast storage. However, I think using more than one node for your backups, and adjusting the various threads used by mmbackup will provide you with sufficient performance improvements. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com ----- Original message ----- From: Jonathan Buzzard Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at spectrumscale.org" Cc: Subject: [EXTERNAL] [gpfsug-discuss] mmbackup questions Date: Thu, Oct 17, 2019 8:00 AM I have been looking to give mmbackup another go (a very long history with it being a pile of steaming dinosaur droppings last time I tried, but that was seven years ago). Anyway having done a backup last night I am curious about something that does not appear to be explained in the documentation. Basically the output has a line like the following Total number of objects inspected: 474630 What is this number? Is it the number of files that have changed since the last backup or something else as it is not the number of files on the file system by any stretch of the imagination. One would hope that it inspected everything on the file system... Also it appears that the shadow database is held on the GPFS file system that is being backed up. Is there any way to change the location of that? I am only using one node for backup (because I am cheap and don't like paying for more PVU's than I need to) and would like to hold it on the node doing the backup where I can put it on SSD. Which does to things firstly hopefully goes a lot faster, and secondly reduces the impact on the file system of the backup. Anyway a significant speed up (assuming it worked) was achieved but I note even the ancient Xeon E3113 (dual core 3GHz) was never taxed (load average never went above one) and we didn't touch the swap despite only have 24GB of RAM. Though the 10GbE networking did get busy during the transfer of data to the TSM server bit of the backup but during the "assembly stage" it was all a bit quiet, and the DSS-G server nodes where not busy either. What options are there for tuning things because I feel it should be able to go a lot faster. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8&m=u_URaXsFxbEw29QGkpa5CnXVGJApxske9lAtEPlerYY&s=mWDp7ziqYJ65-FSCOArzVITL9_qBunPqZ9uC9jgjxn8&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From skylar2 at uw.edu Thu Oct 17 15:26:03 2019 From: skylar2 at uw.edu (Skylar Thompson) Date: Thu, 17 Oct 2019 14:26:03 +0000 Subject: [gpfsug-discuss] mmbackup questions In-Reply-To: <960c7d8ff66dacb199b36590f96e5faa815e5b06.camel@strath.ac.uk> References: <960c7d8ff66dacb199b36590f96e5faa815e5b06.camel@strath.ac.uk> Message-ID: <20191017142603.r7dfwrexfnqilsu7@utumno.gs.washington.edu> On Thu, Oct 17, 2019 at 10:26:45AM +0000, Jonathan Buzzard wrote: > I have been looking to give mmbackup another go (a very long history > with it being a pile of steaming dinosaur droppings last time I tried, > but that was seven years ago). > > Anyway having done a backup last night I am curious about something > that does not appear to be explained in the documentation. > > Basically the output has a line like the following > > Total number of objects inspected: 474630 > > What is this number? Is it the number of files that have changed since > the last backup or something else as it is not the number of files on > the file system by any stretch of the imagination. One would hope that > it inspected everything on the file system... I believe this is the number of paths that matched some include rule (or didn't match some exclude rule) for mmbackup. I would assume it would differ from the "total number of objects backed up" line if there were include/exclude rules that mmbackup couldn't process, leaving it to dsmc to decide whether to process. > Also it appears that the shadow database is held on the GPFS file system > that is being backed up. Is there any way to change the location of that? > I am only using one node for backup (because I am cheap and don't like > paying for more PVU's than I need to) and would like to hold it on the > node doing the backup where I can put it on SSD. Which does to things > firstly hopefully goes a lot faster, and secondly reduces the impact on > the file system of the backup. I haven't tried it, but there is a MMBACKUP_RECORD_ROOT environment variable noted in the mmbackup man path: Specifies an alternative directory name for storing all temporary and permanent records for the backup. The directory name specified must be an existing directory and it cannot contain special characters (for example, a colon, semicolon, blank, tab, or comma). Which seems like it might provide a mechanism to store the shadow database elsewhere. For us, though, we provide storage via a cost center, so we would want our customers to eat the full cost of their excessive file counts. > Anyway a significant speed up (assuming it worked) was achieved but I > note even the ancient Xeon E3113 (dual core 3GHz) was never taxed (load > average never went above one) and we didn't touch the swap despite only > have 24GB of RAM. Though the 10GbE networking did get busy during the > transfer of data to the TSM server bit of the backup but during the > "assembly stage" it was all a bit quiet, and the DSS-G server nodes where > not busy either. What options are there for tuning things because I feel > it should be able to go a lot faster. We have some TSM nodes (corresponding to GPFS filesets) that stress out our mmbackup cluster at the sort step of mmbackup. UNIX sort is not RAM-friendly, as it happens. -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From jonathan.buzzard at strath.ac.uk Thu Oct 17 19:04:47 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Thu, 17 Oct 2019 18:04:47 +0000 Subject: [gpfsug-discuss] mmbackup questions In-Reply-To: <20191017142603.r7dfwrexfnqilsu7@utumno.gs.washington.edu> References: <960c7d8ff66dacb199b36590f96e5faa815e5b06.camel@strath.ac.uk> <20191017142603.r7dfwrexfnqilsu7@utumno.gs.washington.edu> Message-ID: <9fb0e5a0-3eee-fdf1-526c-498f42d89aea@strath.ac.uk> On 17/10/2019 15:26, Skylar Thompson wrote: > On Thu, Oct 17, 2019 at 10:26:45AM +0000, Jonathan Buzzard wrote: >> I have been looking to give mmbackup another go (a very long history >> with it being a pile of steaming dinosaur droppings last time I tried, >> but that was seven years ago). >> >> Anyway having done a backup last night I am curious about something >> that does not appear to be explained in the documentation. >> >> Basically the output has a line like the following >> >> Total number of objects inspected: 474630 >> >> What is this number? Is it the number of files that have changed since >> the last backup or something else as it is not the number of files on >> the file system by any stretch of the imagination. One would hope that >> it inspected everything on the file system... > > I believe this is the number of paths that matched some include rule (or > didn't match some exclude rule) for mmbackup. I would assume it would > differ from the "total number of objects backed up" line if there were > include/exclude rules that mmbackup couldn't process, leaving it to dsmc to > decide whether to process. > After digging through dsminstr.log it would appear to be the sum of the combination of new, changed and deleted files that mmbackup is going to process. There is some wierd sh*t going on though with mmbackup on the face of it, where it sends one file to the TSM server. A line with the total number of files in the file system (aka potential backup candidates) would be nice I think. >> Also it appears that the shadow database is held on the GPFS file system >> that is being backed up. Is there any way to change the location of that? >> I am only using one node for backup (because I am cheap and don't like >> paying for more PVU's than I need to) and would like to hold it on the >> node doing the backup where I can put it on SSD. Which does to things >> firstly hopefully goes a lot faster, and secondly reduces the impact on >> the file system of the backup. > > I haven't tried it, but there is a MMBACKUP_RECORD_ROOT environment > variable noted in the mmbackup man path: > > Specifies an alternative directory name for > storing all temporary and permanent records for > the backup. The directory name specified must > be an existing directory and it cannot contain > special characters (for example, a colon, > semicolon, blank, tab, or comma). > > Which seems like it might provide a mechanism to store the shadow database > elsewhere. For us, though, we provide storage via a cost center, so we > would want our customers to eat the full cost of their excessive file counts. > We have set a file quota of one million for all our users. So far only one users has actually needed it raising. It does however make users come and have a conversation with us about what they are doing. With the one exception they have found ways to do their work without abusing the file system as a database. We don't have a SSD storage pool on the file system so moving it to the backup node for which we can add SSD cheaply (I mean really really cheap these days) is more realistic that adding some SSD for a storage pool to the file system. Once I am a bit more familiar with it I will try changing it to the system disks. It's not SSD at the moment but if it works I can easily justify getting some and replacing the existing drives (it would just be two RAID rebuilds away). Last time it was brought up you could not add extra shelves to an existing DSS-G system, you had to buy a whole new one. This is despite the servers shipping with a full complement of SAS cards and a large box full of 12Gbps SAS cables (well over ?1000 worth at list I reckon) that are completely useless. Ok they work and I could use them elsewhere but frankly why ship them if I can't expand!!! >> Anyway a significant speed up (assuming it worked) was achieved but I >> note even the ancient Xeon E3113 (dual core 3GHz) was never taxed (load >> average never went above one) and we didn't touch the swap despite only >> have 24GB of RAM. Though the 10GbE networking did get busy during the >> transfer of data to the TSM server bit of the backup but during the >> "assembly stage" it was all a bit quiet, and the DSS-G server nodes where >> not busy either. What options are there for tuning things because I feel >> it should be able to go a lot faster. > > We have some TSM nodes (corresponding to GPFS filesets) that stress out our > mmbackup cluster at the sort step of mmbackup. UNIX sort is not > RAM-friendly, as it happens. > I have configured more monitoring of the system, and will watch it over the coming days, but nothing was stressed on our system at all as far as I can tell but it was going slower than I had hoped. It was still way faster than a traditional dsmc incr but I was hoping for more though I am not sure why as the backup now takes place well inside my backup window. Perhaps I am being greedy. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From S.J.Thompson at bham.ac.uk Thu Oct 17 19:37:28 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 17 Oct 2019 18:37:28 +0000 Subject: [gpfsug-discuss] mmbackup questions In-Reply-To: <9fb0e5a0-3eee-fdf1-526c-498f42d89aea@strath.ac.uk> References: <960c7d8ff66dacb199b36590f96e5faa815e5b06.camel@strath.ac.uk> <20191017142603.r7dfwrexfnqilsu7@utumno.gs.washington.edu> <9fb0e5a0-3eee-fdf1-526c-498f42d89aea@strath.ac.uk> Message-ID: Mmbackup uses tsbuhelper internally. This is effectively a diff of the previous and current policy scan. Objects inspected is the count of these files that are changed since the last time and these are the candidates sent to the TSM server. You mention not being able to upgrade a DSS-G, I thought this has been available for sometime as a special bid process. We did something very complicated with ours at one point. I also thought the "no-upgrade" was related to a support position from IBM on creating additional DAs. You can't add new storage to an DA, but believe it's possible and now supported (I think) to add expansion shelves into a new DA. (I think ESS also supports this). Note that you don't necessarily get the same performance of doing this as if you'd purchased a fully stacked system in the first place. For example if you initially had 166 drives as a two expansion system and then add 84 drives in a new expansion, you now have two DAs, one smaller than the other and neither the same as if you'd originally created it with 250 drives... I don't actually have any benchmarks to prove this, but it was my understanding from various discussions over time. There are also now both DSS (and ESS) configs with both spinning and SSD enclosures. I assume these aren't special bid only products anymore. Simon ?On 17/10/2019, 19:05, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" wrote: On 17/10/2019 15:26, Skylar Thompson wrote: > On Thu, Oct 17, 2019 at 10:26:45AM +0000, Jonathan Buzzard wrote: >> I have been looking to give mmbackup another go (a very long history >> with it being a pile of steaming dinosaur droppings last time I tried, >> but that was seven years ago). >> >> Anyway having done a backup last night I am curious about something >> that does not appear to be explained in the documentation. >> >> Basically the output has a line like the following >> >> Total number of objects inspected: 474630 >> >> What is this number? Is it the number of files that have changed since >> the last backup or something else as it is not the number of files on >> the file system by any stretch of the imagination. One would hope that >> it inspected everything on the file system... > > I believe this is the number of paths that matched some include rule (or > didn't match some exclude rule) for mmbackup. I would assume it would > differ from the "total number of objects backed up" line if there were > include/exclude rules that mmbackup couldn't process, leaving it to dsmc to > decide whether to process. > After digging through dsminstr.log it would appear to be the sum of the combination of new, changed and deleted files that mmbackup is going to process. There is some wierd sh*t going on though with mmbackup on the face of it, where it sends one file to the TSM server. A line with the total number of files in the file system (aka potential backup candidates) would be nice I think. >> Also it appears that the shadow database is held on the GPFS file system >> that is being backed up. Is there any way to change the location of that? >> I am only using one node for backup (because I am cheap and don't like >> paying for more PVU's than I need to) and would like to hold it on the >> node doing the backup where I can put it on SSD. Which does to things >> firstly hopefully goes a lot faster, and secondly reduces the impact on >> the file system of the backup. > > I haven't tried it, but there is a MMBACKUP_RECORD_ROOT environment > variable noted in the mmbackup man path: > > Specifies an alternative directory name for > storing all temporary and permanent records for > the backup. The directory name specified must > be an existing directory and it cannot contain > special characters (for example, a colon, > semicolon, blank, tab, or comma). > > Which seems like it might provide a mechanism to store the shadow database > elsewhere. For us, though, we provide storage via a cost center, so we > would want our customers to eat the full cost of their excessive file counts. > We have set a file quota of one million for all our users. So far only one users has actually needed it raising. It does however make users come and have a conversation with us about what they are doing. With the one exception they have found ways to do their work without abusing the file system as a database. We don't have a SSD storage pool on the file system so moving it to the backup node for which we can add SSD cheaply (I mean really really cheap these days) is more realistic that adding some SSD for a storage pool to the file system. Once I am a bit more familiar with it I will try changing it to the system disks. It's not SSD at the moment but if it works I can easily justify getting some and replacing the existing drives (it would just be two RAID rebuilds away). Last time it was brought up you could not add extra shelves to an existing DSS-G system, you had to buy a whole new one. This is despite the servers shipping with a full complement of SAS cards and a large box full of 12Gbps SAS cables (well over ?1000 worth at list I reckon) that are completely useless. Ok they work and I could use them elsewhere but frankly why ship them if I can't expand!!! >> Anyway a significant speed up (assuming it worked) was achieved but I >> note even the ancient Xeon E3113 (dual core 3GHz) was never taxed (load >> average never went above one) and we didn't touch the swap despite only >> have 24GB of RAM. Though the 10GbE networking did get busy during the >> transfer of data to the TSM server bit of the backup but during the >> "assembly stage" it was all a bit quiet, and the DSS-G server nodes where >> not busy either. What options are there for tuning things because I feel >> it should be able to go a lot faster. > > We have some TSM nodes (corresponding to GPFS filesets) that stress out our > mmbackup cluster at the sort step of mmbackup. UNIX sort is not > RAM-friendly, as it happens. > I have configured more monitoring of the system, and will watch it over the coming days, but nothing was stressed on our system at all as far as I can tell but it was going slower than I had hoped. It was still way faster than a traditional dsmc incr but I was hoping for more though I am not sure why as the backup now takes place well inside my backup window. Perhaps I am being greedy. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From novosirj at rutgers.edu Fri Oct 18 02:18:04 2019 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Fri, 18 Oct 2019 01:18:04 +0000 Subject: [gpfsug-discuss] waiters and files causing waiters In-Reply-To: References: Message-ID: <9E891DBC-2785-4A49-9E4D-D6D2C11B8740@rutgers.edu> Found my notes on this; very similar to what Behrooz was saying. This here is from ?mmfsadm dump waiters,selected_files?; as you can see here, we?re looking at thread 29168. Apparently below, ?inodeFlushHolder? corresponds to that same thread in the case I was looking at. You could then look up the inode with ?tsfindinode -i ?, so like for the below, "tsfindinode -i 41538053 /gpfs/cache? on our system. ===== dump waiters ==== Current time 2019-05-01_13:48:26-0400 Waiting 0.1669 sec since 13:48:25, monitored, thread 29168 FileBlockWriteFetchHandlerThread: on ThCond 0x7F55E40014C8 (MsgRecordCondvar), reason 'RPC wait' for quotaMsgRequestShare on node 192.168.33.7 ===== dump selected_files ===== Current time 2019-05-01_13:48:36-0400 ... OpenFile: 4E044E5B0601A8C0:000000000279D205:0000000000000000 @ 0x1806AC5EAC8 cach 1 ref 1 hc 2 tc 6 mtx 0x1806AC5EAF8 Inode: valid eff token xw @ 0x1806AC5EC70, ctMode xw seq 170823 lock state [ wf: 1 ] x [] flags [ ] Mnode: valid eff token xw @ 0x1806AC5ECC0, ctMode xw seq 170823 DMAPI: invalid eff token nl @ 0x1806AC5EC20, ctMode nl seq 170821 SMBOpen: valid eff token (A:RMA D: ) @ 0x1806AC5EB50, ctMode (A:RMA D: ) seq 170823 lock state [ M(2) D: ] x [] flags [ ] SMBOpLk: valid eff token wf @ 0x1806AC5EBC0, ctMode wf Flags 0x30 (pfro+pfxw) seq 170822 BR: @ 0x1806AC5ED20, ctMode nl Flags 0x10 (pfro) seq 170823 treeP 0x18016189C08 C btFastTrack 0 1 ranges mode RO/XW: BLK [0,INF] mode XW node <403> Fcntl: @ 0x1806AC5ED48, ctMode nl Flags 0x30 (pfro+pfxw) seq 170823 treeP 0x18031A5E3F8 C btFastTrack 0 1 ranges mode RO/XW: BLK [0,INF] mode XW node <403> inode 41538053 snap 0 USERFILE nlink 1 genNum 0x3CC2743F mode 0200100600: -rw------- tmmgr node (other) metanode (me) fail+panic count -1 flags 0x0, remoteStart 0 remoteCnt 0 localCnt 177 lastFrom 65535 switchCnt 0 locks held in mode xw: 0x1806AC5F238: 0x0-0xFFF tid 15954 gbl 0 mode xw rel 0 BRL nXLocksOrRelinquishes 285 vfsReference 1 dioCount 0 dioFlushNeeded 1 dioSkipCounter 0 dioReentryThreshold 0.000000 hasWriterInstance 1 inodeFlushFlag 1 inodeFlushHolder 29168 openInstCount 1 metadataFlushCount 2, metadataFlushWaiters 0/0, metadataCommitVersion 1 bufferListCount 1 bufferListChangeCount 3 dirty status: flushed dirtiedSyncNum 1477623 SMB oplock state: nWriters 1 indBlockDeallocLock: sharedLockWord 1 exclLockWord 0 upgradeWaitingS_W 0 upgradeWaitingW_X 0 inodeValid 1 objectVersion 240 flushVersion 8086700 mnodeChangeCount 1 block size code 5 (32 subblocksPerFileBlock) dataBytesPerFileBlock 4194304 fileSize 0 synchedFileSize 0 indirectionLevel 1 atime 1556732911.496160000 mtime 1556732911.496479000 ctime 1556732911.496479000 crtime 1556732911.496160000 owner uid 169589 gid 169589 > On Oct 10, 2019, at 4:43 PM, Damir Krstic wrote: > > is it possible via some set of mmdiag --waiters or mmfsadm dump ? to figure out which files or directories access (whether it's read or write) is causing long-er waiters? > > in all my looking i have not been able to get that information out of various diagnostic commands. > > thanks, > damir > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathan.buzzard at strath.ac.uk Fri Oct 18 08:58:40 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 18 Oct 2019 07:58:40 +0000 Subject: [gpfsug-discuss] mmbackup questions In-Reply-To: References: <960c7d8ff66dacb199b36590f96e5faa815e5b06.camel@strath.ac.uk> <20191017142603.r7dfwrexfnqilsu7@utumno.gs.washington.edu> <9fb0e5a0-3eee-fdf1-526c-498f42d89aea@strath.ac.uk> Message-ID: On 17/10/2019 19:37, Simon Thompson wrote: > Mmbackup uses tsbuhelper internally. This is effectively a diff of > the previous and current policy scan. Objects inspected is the count > of these files that are changed since the last time and these are the > candidates sent to the TSM server. > > You mention not being able to upgrade a DSS-G, I thought this has > been available for sometime as a special bid process. We did > something very complicated with ours at one point. I also thought the > "no-upgrade" was related to a support position from IBM on creating > additional DAs. You can't add new storage to an DA, but believe it's > possible and now supported (I think) to add expansion shelves into a > new DA. (I think ESS also supports this). Note that you don't > necessarily get the same performance of doing this as if you'd > purchased a fully stacked system in the first place. For example if > you initially had 166 drives as a two expansion system and then add > 84 drives in a new expansion, you now have two DAs, one smaller than > the other and neither the same as if you'd originally created it with > 250 drives... I don't actually have any benchmarks to prove this, but > it was my understanding from various discussions over time. > Well it was only the beginning of this year that we asked for a quote for expanding our DSS-G as part of a wider storage upgrade that was to be put to the IT funding committee at the university. I was expecting just to need some more shelves, only to told we need to start again. Like I said if that was the case why ship with all those extra unneeded and unusable SAS cards and SAS cables. At the very least it is not environmentally friendly. Then again the spec that came back had a 2x10Gb LOM, despite the DSS-G documentation being very explicit about needing a 4x1Gb LOM, which is still the case in the 2.4b documentation as of last month. I do note odd numbers of shelves other than one is now supported. That said the tools in at least 2.1 incorrectly states having one shelf is unsupported!!! Presumably they the person writing the tool only tested for even numbers not realizing one while odd was supported. You can also mix shelf types now, but again if I wanted to add some SSD it's a new DSS-G not a couple of D1224 shelves. That also nukes the DA argument for no upgrades I think because you would not be wanting to mix the two in that way. > There are also now both DSS (and ESS) configs with both spinning and > SSD enclosures. I assume these aren't special bid only products > anymore. I don't think so, along with odd numbers of shelves they are in general Lenovo literature. They also have a node with NVMe up the front (or more accurately up the back in PCIe slots), the DSS-G100. My take on the DSS-G is that it is a cost effective way to deploy GPFS storage. However there are loads of seemingly arbitrary quirks and limitations, a bit sh*t crazy upgrade procedure and questionable hardware maintenance. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From scale at us.ibm.com Fri Oct 18 09:34:01 2019 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Fri, 18 Oct 2019 16:34:01 +0800 Subject: [gpfsug-discuss] waiters and files causing waiters In-Reply-To: <9E891DBC-2785-4A49-9E4D-D6D2C11B8740@rutgers.edu> References: <9E891DBC-2785-4A49-9E4D-D6D2C11B8740@rutgers.edu> Message-ID: Right for the example from Ryan(and according to the thread name, you know that it is writing to a file or directory), but for other cases, it may take more steps to figure out what access to which file is causing the long waiters(i.e., when mmap is being used on some nodes, or token revoke pending from some node, and etc.). Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Ryan Novosielski To: gpfsug main discussion list Date: 2019/10/18 09:18 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] waiters and files causing waiters Sent by: gpfsug-discuss-bounces at spectrumscale.org Found my notes on this; very similar to what Behrooz was saying. This here is from ?mmfsadm dump waiters,selected_files?; as you can see here, we?re looking at thread 29168. Apparently below, ?inodeFlushHolder? corresponds to that same thread in the case I was looking at. You could then look up the inode with ?tsfindinode -i ?, so like for the below, "tsfindinode -i 41538053 /gpfs/cache? on our system. ===== dump waiters ==== Current time 2019-05-01_13:48:26-0400 Waiting 0.1669 sec since 13:48:25, monitored, thread 29168 FileBlockWriteFetchHandlerThread: on ThCond 0x7F55E40014C8 (MsgRecordCondvar), reason 'RPC wait' for quotaMsgRequestShare on node 192.168.33.7 ===== dump selected_files ===== Current time 2019-05-01_13:48:36-0400 ... OpenFile: 4E044E5B0601A8C0:000000000279D205:0000000000000000 @ 0x1806AC5EAC8 cach 1 ref 1 hc 2 tc 6 mtx 0x1806AC5EAF8 Inode: valid eff token xw @ 0x1806AC5EC70, ctMode xw seq 170823 lock state [ wf: 1 ] x [] flags [ ] Mnode: valid eff token xw @ 0x1806AC5ECC0, ctMode xw seq 170823 DMAPI: invalid eff token nl @ 0x1806AC5EC20, ctMode nl seq 170821 SMBOpen: valid eff token (A:RMA D: ) @ 0x1806AC5EB50, ctMode (A:RMA D: ) seq 170823 lock state [ M(2) D: ] x [] flags [ ] SMBOpLk: valid eff token wf @ 0x1806AC5EBC0, ctMode wf Flags 0x30 (pfro+pfxw) seq 170822 BR: @ 0x1806AC5ED20, ctMode nl Flags 0x10 (pfro) seq 170823 treeP 0x18016189C08 C btFastTrack 0 1 ranges mode RO/XW: BLK [0,INF] mode XW node <403> Fcntl: @ 0x1806AC5ED48, ctMode nl Flags 0x30 (pfro+pfxw) seq 170823 treeP 0x18031A5E3F8 C btFastTrack 0 1 ranges mode RO/XW: BLK [0,INF] mode XW node <403> inode 41538053 snap 0 USERFILE nlink 1 genNum 0x3CC2743F mode 0200100600: -rw------- tmmgr node (other) metanode (me) fail+panic count -1 flags 0x0, remoteStart 0 remoteCnt 0 localCnt 177 lastFrom 65535 switchCnt 0 locks held in mode xw: 0x1806AC5F238: 0x0-0xFFF tid 15954 gbl 0 mode xw rel 0 BRL nXLocksOrRelinquishes 285 vfsReference 1 dioCount 0 dioFlushNeeded 1 dioSkipCounter 0 dioReentryThreshold 0.000000 hasWriterInstance 1 inodeFlushFlag 1 inodeFlushHolder 29168 openInstCount 1 metadataFlushCount 2, metadataFlushWaiters 0/0, metadataCommitVersion 1 bufferListCount 1 bufferListChangeCount 3 dirty status: flushed dirtiedSyncNum 1477623 SMB oplock state: nWriters 1 indBlockDeallocLock: sharedLockWord 1 exclLockWord 0 upgradeWaitingS_W 0 upgradeWaitingW_X 0 inodeValid 1 objectVersion 240 flushVersion 8086700 mnodeChangeCount 1 block size code 5 (32 subblocksPerFileBlock) dataBytesPerFileBlock 4194304 fileSize 0 synchedFileSize 0 indirectionLevel 1 atime 1556732911.496160000 mtime 1556732911.496479000 ctime 1556732911.496479000 crtime 1556732911.496160000 owner uid 169589 gid 169589 > On Oct 10, 2019, at 4:43 PM, Damir Krstic wrote: > > is it possible via some set of mmdiag --waiters or mmfsadm dump ? to figure out which files or directories access (whether it's read or write) is causing long-er waiters? > > in all my looking i have not been able to get that information out of various diagnostic commands. > > thanks, > damir > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=VdmIfneKWlidYoO2I90hBuZJ2VxXu8L8oq86E7zyh8Q&s=dkQrCzMmxeh6tu0UpPgSIphmRwcBiSpL7QbZPw5RNtI&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=VdmIfneKWlidYoO2I90hBuZJ2VxXu8L8oq86E7zyh8Q&s=dkQrCzMmxeh6tu0UpPgSIphmRwcBiSpL7QbZPw5RNtI&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From jon at well.ox.ac.uk Tue Oct 22 10:12:31 2019 From: jon at well.ox.ac.uk (Jon Diprose) Date: Tue, 22 Oct 2019 09:12:31 +0000 Subject: [gpfsug-discuss] AMD Rome support? Message-ID: Dear GPFSUG, I see the faq says Spectrum Scale is supported on "AMD Opteron based servers". Does anyone know if/when support will be officially extended to cover AMD Epyc, especially the new 7002 (Rome) series? Does anyone have any experience of running Spectrum Scale on Rome they could share, in particular for protocol nodes and for plain clients? Thanks, Jon -- Dr. Jonathan Diprose Tel: 01865 287837 Research Computing Manager Henry Wellcome Building for Genomic Medicine Roosevelt Drive, Headington, Oxford OX3 7BN From knop at us.ibm.com Tue Oct 22 17:30:38 2019 From: knop at us.ibm.com (Felipe Knop) Date: Tue, 22 Oct 2019 12:30:38 -0400 Subject: [gpfsug-discuss] AMD Rome support? In-Reply-To: References: Message-ID: Jon, AMD processors which are completely compatible with Opteron should also work. Please also refer to Q5.3 on the SMP scaling limit: 64 cores: https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html Regards, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Jon Diprose To: gpfsug main discussion list Date: 10/22/2019 05:13 AM Subject: [EXTERNAL] [gpfsug-discuss] AMD Rome support? Sent by: gpfsug-discuss-bounces at spectrumscale.org Dear GPFSUG, I see the faq says Spectrum Scale is supported on "AMD Opteron based servers". Does anyone know if/when support will be officially extended to cover AMD Epyc, especially the new 7002 (Rome) series? Does anyone have any experience of running Spectrum Scale on Rome they could share, in particular for protocol nodes and for plain clients? Thanks, Jon -- Dr. Jonathan Diprose Tel: 01865 287837 Research Computing Manager Henry Wellcome Building for Genomic Medicine Roosevelt Drive, Headington, Oxford OX3 7BN _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=eizQJGD_5DpnaQUqNkIE3V9qJciVjfLCgo4ZHixZ5Ns&s=JomlTDVPlwFCvLtVOmGd4J6FrfbUK6cMVlLe5Ut638U&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From jonathan.buzzard at strath.ac.uk Tue Oct 22 19:40:36 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 22 Oct 2019 18:40:36 +0000 Subject: [gpfsug-discuss] AMD Rome support? In-Reply-To: References: Message-ID: <1c594dbd-4f5c-45aa-57aa-6b610d5c0e86@strath.ac.uk> On 22/10/2019 17:30, Felipe Knop wrote: > Jon, > > AMD processors which are completely compatible with Opteron should also > work. > > Please also refer to Q5.3 on the SMP scaling limit: 64 cores: > > https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html > Hum, is that per CPU or the total for a machine? The reason I ask is we have some large memory nodes (3TB of RAM) and these are quad Xeon 6138 CPU's giving a total of 80 cores in the machine... We have not seen any problems, but if it is 64 cores per machine IBM needs to do some scaling testing ASAP to raise the limit as 64 cores per machine in 2019 is ridiculously low. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From Stephan.Peinkofer at lrz.de Wed Oct 23 06:00:44 2019 From: Stephan.Peinkofer at lrz.de (Peinkofer, Stephan) Date: Wed, 23 Oct 2019 05:00:44 +0000 Subject: [gpfsug-discuss] AMD Rome support? In-Reply-To: References: Message-ID: <0E081EFD-538E-4E00-A625-54B99F57D960@lrz.de> Dear Jon, we run a bunch of AMD EPYC Naples Dual Socket servers with GPFS in our TSM Server Cluster. From what I can say it runs stable, but IO performance in general and GPFS performance in particular - even compared to an Xeon E5 v3 system - is rather poor. So to put that into perspective on the Xeon Systems with two EDR IB Links, we get 20GB/s read and write performance to GPFS using iozone very easily. On the AMD systems - with all AMD EPYC tuning suggestions applied you can find in the internet - we get around 15GB/s write but only 6GB/s read. We also opened a ticket at IBM for this but never found out anything. Probably because not many are running GPFS on AMD EPYC right now? The answer from AMD basically was that the bad IO performance is expected in Dual Socket systems because the Socket Interconnect is the bottleneck. (See also the IB tests DELL did https://www.dell.com/support/article/de/de/debsdt1/sln313856/amd-epyc-stream-hpl-infiniband-and-wrf-performance-study?lang=en as soon as you have to cross the socket border you get only half of the IB performance) Of course with ROME everything get?s better (that?s what AMD told us through our vendor) but if you have the chance then I would recommend to benchmark AMD vs. XEON with your particular IO workloads before buying. Best Regards, Stephan Peinkofer -- Stephan Peinkofer Dipl. Inf. (FH), M. Sc. (TUM) Leibniz Supercomputing Centre Data and Storage Division Boltzmannstra?e 1, 85748 Garching b. M?nchen URL: http://www.lrz.de On 22. Oct 2019, at 11:12, Jon Diprose > wrote: Dear GPFSUG, I see the faq says Spectrum Scale is supported on "AMD Opteron based servers". Does anyone know if/when support will be officially extended to cover AMD Epyc, especially the new 7002 (Rome) series? Does anyone have any experience of running Spectrum Scale on Rome they could share, in particular for protocol nodes and for plain clients? Thanks, Jon -- Dr. Jonathan Diprose > Tel: 01865 287837 Research Computing Manager Henry Wellcome Building for Genomic Medicine Roosevelt Drive, Headington, Oxford OX3 7BN _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Ivano.Talamo at psi.ch Wed Oct 23 10:49:02 2019 From: Ivano.Talamo at psi.ch (Talamo Ivano Giuseppe (PSI)) Date: Wed, 23 Oct 2019 09:49:02 +0000 Subject: [gpfsug-discuss] Filesystem access issues via CES NFS In-Reply-To: References: <08c2be8b-ece1-a7a4-606f-e63fe854a8b1@psi.ch> <717f49aade0b439eb1b99fc620a21cac@maxiv.lu.se> <9456645b0a1f4b488b13874ea672b9b8@maxiv.lu.se> Message-ID: Dear all, We are actually in the process of upgrading our CES cluster to 5.0.3-3 but we have doubts about how to proceed. Considering that the CES cluster is in production and heavily used, our plan is to add a new node with 5.0.3-3 to the cluster that is currently 5.0.2.1. And we would like to proceed in a cautious way, so that the new node would not take any IP and just one day per week (when we will declare to be ?at risk?) we would move some IPs to it. After some weeks of tests if we would see no problem we would upgrade the rest of the cluster. But reading these doc [1] it seems that we cannot have multiple GPFS/SMB version in the same cluster. So in that case we could not have a testing/acceptance phase but could only make the full blind jump. Can someone confirm or negate this? Thanks, Ivano [1] https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/com.ibm.spectrum.scale.v5r02.doc/bl1ins_updatingsmb.htm On 04.10.19, 12:55, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Malahal R Naineni" wrote: You can use 5.0.3.3 . There is no fix for the sssd issue yet though. I will work with Ganesha upstream community pretty soon. Regards, Malahal. ----- Original message ----- From: Leonardo Sala To: gpfsug main discussion list , "Malahal R Naineni" , Cc: Subject: [EXTERNAL] Re: [gpfsug-discuss] Filesystem access issues via CES NFS Date: Fri, Oct 4, 2019 12:02 PM Dear Malahal, thanks for the answer. Concerning SSSD, we are also using it, should we use 5.0.2-PTF3? We would like to avoid using 5.0.2.2, as it has issues with recent RHEL 7.6 kernels [*] and we are impacted: do you suggest to use 5.0.3.3? cheers leo [*] https://www.ibm.com/support/pages/ibm-spectrum-scale-gpfs-releases-42313-or-later-and-5022-or-later-have-issues-where-kernel-crashes-rhel76-0 Paul Scherrer Institut Dr. Leonardo Sala Group Leader High Performance Computing Deputy Section Head Science IT Science IT WHGA/106 5232 Villigen PSI Switzerland Phone: +41 56 310 3369 leonardo.sala at psi.ch www.psi.ch On 03.10.19 19:15, Malahal R Naineni wrote: >> @Malahal: Looks like you have written the netgroup caching code, feel free to ask for further details if required. Hi Ulrich, Ganesha uses innetgr() call for netgroup information and sssd has too many issues in its implementation. Redhat said that they are going to fix sssd synchronization issues in RHEL8. It is in my plate to serialize innergr() call in Ganesha to match kernel NFS server usage! I expect the sssd issue to give EACCESS/EPERM kind of issue but not EINVAL though. If you are using sssd, you must be getting into a sssd issue. Ganesha has a host-ip cache fix in 5.0.2 PTF3. Please make sure you use ganesha version V2.5.3-ibm030.01 if you are using netgroups (shipped with 5.0.2 PTF3 but can be used with Scale 5.0.1 or later) Regards, Malahal. ----- Original message ----- From: Ulrich Sibiller Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Cc: Subject: Re: [gpfsug-discuss] Filesystem access issues via CES NFS Date: Thu, Dec 13, 2018 7:32 PM On 23.11.2018 14:41, Andreas Mattsson wrote: > Yes, this is repeating. > > We?ve ascertained that it has nothing to do at all with file operations on the GPFS side. > > Randomly throughout the filesystem mounted via NFS, ls or file access will give > > ? > > > ls: reading directory /gpfs/filessystem/test/testdir: Invalid argument > > ? > > Trying again later might work on that folder, but might fail somewhere else. > > We have tried exporting the same filesystem via a standard kernel NFS instead of the CES > Ganesha-NFS, and then the problem doesn?t exist. > > So it is definitely related to the Ganesha NFS server, or its interaction with the file system. > > Will see if I can get a tcpdump of the issue. We see this, too. We cannot trigger it. Fortunately I have managed to capture some logs with debugging enabled. I have now dug into the ganesha 2.5.3 code and I think the netgroup caching is the culprit. Here some FULL_DEBUG output: 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :Check for address 1.2.3.4 for export id 1 path /gpfsexport 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] client_match :EXPORT :M_DBG :Match V4: 0xcf7fe0 NETGROUP_CLIENT: netgroup1 (options=421021e2root_squash , RWrw, 3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] client_match :EXPORT :M_DBG :Match V4: 0xcfe320 NETGROUP_CLIENT: netgroup2 (options=421021e2root_squash , RWrw, 3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] client_match :EXPORT :M_DBG :Match V4: 0xcfe380 NETGROUP_CLIENT: netgroup3 (options=421021e2root_squash , RWrw, 3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :EXPORT (options=03303002 , , , , , -- Deleg, , ) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :EXPORT_DEFAULTS (options=42102002root_squash , ----, 3--, ---, TCP, ----, Manage_Gids , , anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :default options (options=03303002root_squash , ----, 34-, UDP, TCP, ----, No Manage_Gids, -- Deleg, anon_uid= -2, anon_gid= -2, none, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :Final options (options=42102002root_squash , ----, 3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_rpc_execute :DISP :INFO :DISP: INFO: Client ::ffff:1.2.3.4 is not allowed to access Export_Id 1 /gpfsexport, vers=3, proc=18 The client "client1" is definitely a member of the "netgroup1". But the NETGROUP_CLIENT lookups for "netgroup2" and "netgroup3" can only happen if the netgroup caching code reports that "client1" is NOT a member of "netgroup1". I have also opened a support case at IBM for this. @Malahal: Looks like you have written the netgroup caching code, feel free to ask for further details if required. Kind regards, Ulrich Sibiller -- Dipl.-Inf. Ulrich Sibiller science + computing ag System Administration Hagellocher Weg 73 72070 Tuebingen, Germany https://atos.net/de/deutschland/sc -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From luis.bolinches at fi.ibm.com Wed Oct 23 10:56:57 2019 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Wed, 23 Oct 2019 09:56:57 +0000 Subject: [gpfsug-discuss] Filesystem access issues via CES NFS In-Reply-To: References: , <08c2be8b-ece1-a7a4-606f-e63fe854a8b1@psi.ch><717f49aade0b439eb1b99fc620a21cac@maxiv.lu.se><9456645b0a1f4b488b13874ea672b9b8@maxiv.lu.se> Message-ID: An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Oct 23 11:14:23 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Wed, 23 Oct 2019 10:14:23 +0000 Subject: [gpfsug-discuss] Filesystem access issues via CES NFS In-Reply-To: References: <08c2be8b-ece1-a7a4-606f-e63fe854a8b1@psi.ch> <717f49aade0b439eb1b99fc620a21cac@maxiv.lu.se> <9456645b0a1f4b488b13874ea672b9b8@maxiv.lu.se> Message-ID: <69D9E45C-1B31-4C5F-97C1-37F9C8ECC6EF@bham.ac.uk> From our experience, you can generally upgrade the GPFS code node by node, but the SMB code has to be identical on all nodes. So that's basically a do it one day and cross your fingers it doesn't break moment... but it is disruptive as well as you have to stop SMB to do the upgrade. I think there is a long standing RFE open on this about non disruptive SMB upgrades... Simon ?On 23/10/2019, 10:49, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Ivano.Talamo at psi.ch" wrote: Dear all, We are actually in the process of upgrading our CES cluster to 5.0.3-3 but we have doubts about how to proceed. Considering that the CES cluster is in production and heavily used, our plan is to add a new node with 5.0.3-3 to the cluster that is currently 5.0.2.1. And we would like to proceed in a cautious way, so that the new node would not take any IP and just one day per week (when we will declare to be ?at risk?) we would move some IPs to it. After some weeks of tests if we would see no problem we would upgrade the rest of the cluster. But reading these doc [1] it seems that we cannot have multiple GPFS/SMB version in the same cluster. So in that case we could not have a testing/acceptance phase but could only make the full blind jump. Can someone confirm or negate this? Thanks, Ivano [1] https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/com.ibm.spectrum.scale.v5r02.doc/bl1ins_updatingsmb.htm On 04.10.19, 12:55, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Malahal R Naineni" wrote: You can use 5.0.3.3 . There is no fix for the sssd issue yet though. I will work with Ganesha upstream community pretty soon. Regards, Malahal. ----- Original message ----- From: Leonardo Sala To: gpfsug main discussion list , "Malahal R Naineni" , Cc: Subject: [EXTERNAL] Re: [gpfsug-discuss] Filesystem access issues via CES NFS Date: Fri, Oct 4, 2019 12:02 PM Dear Malahal, thanks for the answer. Concerning SSSD, we are also using it, should we use 5.0.2-PTF3? We would like to avoid using 5.0.2.2, as it has issues with recent RHEL 7.6 kernels [*] and we are impacted: do you suggest to use 5.0.3.3? cheers leo [*] https://www.ibm.com/support/pages/ibm-spectrum-scale-gpfs-releases-42313-or-later-and-5022-or-later-have-issues-where-kernel-crashes-rhel76-0 Paul Scherrer Institut Dr. Leonardo Sala Group Leader High Performance Computing Deputy Section Head Science IT Science IT WHGA/106 5232 Villigen PSI Switzerland Phone: +41 56 310 3369 leonardo.sala at psi.ch www.psi.ch On 03.10.19 19:15, Malahal R Naineni wrote: >> @Malahal: Looks like you have written the netgroup caching code, feel free to ask for further details if required. Hi Ulrich, Ganesha uses innetgr() call for netgroup information and sssd has too many issues in its implementation. Redhat said that they are going to fix sssd synchronization issues in RHEL8. It is in my plate to serialize innergr() call in Ganesha to match kernel NFS server usage! I expect the sssd issue to give EACCESS/EPERM kind of issue but not EINVAL though. If you are using sssd, you must be getting into a sssd issue. Ganesha has a host-ip cache fix in 5.0.2 PTF3. Please make sure you use ganesha version V2.5.3-ibm030.01 if you are using netgroups (shipped with 5.0.2 PTF3 but can be used with Scale 5.0.1 or later) Regards, Malahal. ----- Original message ----- From: Ulrich Sibiller Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Cc: Subject: Re: [gpfsug-discuss] Filesystem access issues via CES NFS Date: Thu, Dec 13, 2018 7:32 PM On 23.11.2018 14:41, Andreas Mattsson wrote: > Yes, this is repeating. > > We?ve ascertained that it has nothing to do at all with file operations on the GPFS side. > > Randomly throughout the filesystem mounted via NFS, ls or file access will give > > ? > > > ls: reading directory /gpfs/filessystem/test/testdir: Invalid argument > > ? > > Trying again later might work on that folder, but might fail somewhere else. > > We have tried exporting the same filesystem via a standard kernel NFS instead of the CES > Ganesha-NFS, and then the problem doesn?t exist. > > So it is definitely related to the Ganesha NFS server, or its interaction with the file system. > > Will see if I can get a tcpdump of the issue. We see this, too. We cannot trigger it. Fortunately I have managed to capture some logs with debugging enabled. I have now dug into the ganesha 2.5.3 code and I think the netgroup caching is the culprit. Here some FULL_DEBUG output: 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :Check for address 1.2.3.4 for export id 1 path /gpfsexport 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] client_match :EXPORT :M_DBG :Match V4: 0xcf7fe0 NETGROUP_CLIENT: netgroup1 (options=421021e2root_squash , RWrw, 3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] client_match :EXPORT :M_DBG :Match V4: 0xcfe320 NETGROUP_CLIENT: netgroup2 (options=421021e2root_squash , RWrw, 3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] client_match :EXPORT :M_DBG :Match V4: 0xcfe380 NETGROUP_CLIENT: netgroup3 (options=421021e2root_squash , RWrw, 3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :EXPORT (options=03303002 , , , , , -- Deleg, , ) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :EXPORT_DEFAULTS (options=42102002root_squash , ----, 3--, ---, TCP, ----, Manage_Gids , , anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :default options (options=03303002root_squash , ----, 34-, UDP, TCP, ----, No Manage_Gids, -- Deleg, anon_uid= -2, anon_gid= -2, none, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :Final options (options=42102002root_squash , ----, 3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_rpc_execute :DISP :INFO :DISP: INFO: Client ::ffff:1.2.3.4 is not allowed to access Export_Id 1 /gpfsexport, vers=3, proc=18 The client "client1" is definitely a member of the "netgroup1". But the NETGROUP_CLIENT lookups for "netgroup2" and "netgroup3" can only happen if the netgroup caching code reports that "client1" is NOT a member of "netgroup1". I have also opened a support case at IBM for this. @Malahal: Looks like you have written the netgroup caching code, feel free to ask for further details if required. Kind regards, Ulrich Sibiller -- Dipl.-Inf. Ulrich Sibiller science + computing ag System Administration Hagellocher Weg 73 72070 Tuebingen, Germany https://atos.net/de/deutschland/sc -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathan.buzzard at strath.ac.uk Wed Oct 23 12:20:18 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 23 Oct 2019 11:20:18 +0000 Subject: [gpfsug-discuss] Filesystem access issues via CES NFS In-Reply-To: <69D9E45C-1B31-4C5F-97C1-37F9C8ECC6EF@bham.ac.uk> References: <08c2be8b-ece1-a7a4-606f-e63fe854a8b1@psi.ch> <717f49aade0b439eb1b99fc620a21cac@maxiv.lu.se> <9456645b0a1f4b488b13874ea672b9b8@maxiv.lu.se> <69D9E45C-1B31-4C5F-97C1-37F9C8ECC6EF@bham.ac.uk> Message-ID: On Wed, 2019-10-23 at 10:14 +0000, Simon Thompson wrote: > From our experience, you can generally upgrade the GPFS code node by > node, but the SMB code has to be identical on all nodes. So that's > basically a do it one day and cross your fingers it doesn't break > moment... but it is disruptive as well as you have to stop SMB to do > the upgrade. I think there is a long standing RFE open on this about > non disruptive SMB upgrades... > My understanding is that the issue is the ctdb database suffers from basically being a "memory dump", so a change in the code can effect the database so all the nodes have to be the same. It's the same issue that historically plagued Microsoft Office file formats. Though of course you might get lucky and it just works. I have in the past in the days of role your own because there was no such thing as IBM provided Samba for GPFS done exactly that on several occasions. There was not warnings not to at the time... If you want to do testing before deployment a test cluster is the way forward. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From Ivano.Talamo at psi.ch Wed Oct 23 12:23:22 2019 From: Ivano.Talamo at psi.ch (Talamo Ivano Giuseppe (PSI)) Date: Wed, 23 Oct 2019 11:23:22 +0000 Subject: [gpfsug-discuss] Filesystem access issues via CES NFS In-Reply-To: References: <08c2be8b-ece1-a7a4-606f-e63fe854a8b1@psi.ch> <717f49aade0b439eb1b99fc620a21cac@maxiv.lu.se> <9456645b0a1f4b488b13874ea672b9b8@maxiv.lu.se> <69D9E45C-1B31-4C5F-97C1-37F9C8ECC6EF@bham.ac.uk> Message-ID: Yes, thanks for the feedback. We already have a test cluster, so I guess we will go that way, just making sure to stay as close as possible to the production one. Cheers, Ivano On 23.10.19, 13:20, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" wrote: On Wed, 2019-10-23 at 10:14 +0000, Simon Thompson wrote: > From our experience, you can generally upgrade the GPFS code node by > node, but the SMB code has to be identical on all nodes. So that's > basically a do it one day and cross your fingers it doesn't break > moment... but it is disruptive as well as you have to stop SMB to do > the upgrade. I think there is a long standing RFE open on this about > non disruptive SMB upgrades... > My understanding is that the issue is the ctdb database suffers from basically being a "memory dump", so a change in the code can effect the database so all the nodes have to be the same. It's the same issue that historically plagued Microsoft Office file formats. Though of course you might get lucky and it just works. I have in the past in the days of role your own because there was no such thing as IBM provided Samba for GPFS done exactly that on several occasions. There was not warnings not to at the time... If you want to do testing before deployment a test cluster is the way forward. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From A.Wolf-Reber at de.ibm.com Wed Oct 23 14:05:24 2019 From: A.Wolf-Reber at de.ibm.com (Alexander Wolf) Date: Wed, 23 Oct 2019 13:05:24 +0000 Subject: [gpfsug-discuss] Filesystem access issues via CES NFS In-Reply-To: References: , <08c2be8b-ece1-a7a4-606f-e63fe854a8b1@psi.ch><717f49aade0b439eb1b99fc620a21cac@maxiv.lu.se><9456645b0a1f4b488b13874ea672b9b8@maxiv.lu.se><69D9E45C-1B31-4C5F-97C1-37F9C8ECC6EF@bham.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15718124397183.png Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15718124397184.png Type: image/png Size: 6645 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15718124397185.png Type: image/png Size: 1134 bytes Desc: not available URL: From david_johnson at brown.edu Wed Oct 23 16:19:24 2019 From: david_johnson at brown.edu (David Johnson) Date: Wed, 23 Oct 2019 11:19:24 -0400 Subject: [gpfsug-discuss] question about spectrum scale 5.0.3 installer Message-ID: <71C5E053-263A-4889-99EC-63F9A8D5E806@brown.edu> I built a test cluster a month ago on 14 nodes. Today I want to install two more NSD nodes. When I tried to run the installer, it looks like it is going back and fiddling with the nodes that were installed earlier, and are up and running with the filesystem mounted. I ended up having to abort the install (rebooted the two new nodes because they were stuck on multpath that had had earlier errors), and the messages indicated that the installation failed on all the existing NSD and GUI nodes, but no mention of the two that I wanted to install on. Do I have anything to worry about when I try again (now that multipath is fixed)? I want to be able to incrementally add servers and clients as we go along, and not have the installer messing up previous progress. Can I tell the installer exactly which nodes to work on? Thanks, ? ddj Dave Johnson Brown University From david_johnson at brown.edu Wed Oct 23 16:33:01 2019 From: david_johnson at brown.edu (David Johnson) Date: Wed, 23 Oct 2019 11:33:01 -0400 Subject: [gpfsug-discuss] question about spectrum scale 5.0.3 installer In-Reply-To: <71C5E053-263A-4889-99EC-63F9A8D5E806@brown.edu> References: <71C5E053-263A-4889-99EC-63F9A8D5E806@brown.edu> Message-ID: <54DAC656-CEFE-4AF2-BB4F-9A595DD067C4@brown.edu> By the way, we have been dealing with adding and deleting nodes manually since GPFS 3.4, back in 2009. At what point is the spectrumscale command line utility more trouble than it?s worth? > On Oct 23, 2019, at 11:19 AM, David Johnson wrote: > > I built a test cluster a month ago on 14 nodes. Today I want to install two more NSD nodes. > When I tried to run the installer, it looks like it is going back and fiddling with the nodes that > were installed earlier, and are up and running with the filesystem mounted. > > I ended up having to abort the install (rebooted the two new nodes because they were stuck > on multpath that had had earlier errors), and the messages indicated that the installation failed > on all the existing NSD and GUI nodes, but no mention of the two that I wanted to install on. > > Do I have anything to worry about when I try again (now that multipath is fixed)? I want to be > able to incrementally add servers and clients as we go along, and not have the installer > messing up previous progress. Can I tell the installer exactly which nodes to work on? > > Thanks, > ? ddj > Dave Johnson > Brown University From Robert.Oesterlin at nuance.com Thu Oct 24 15:03:25 2019 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 24 Oct 2019 14:03:25 +0000 Subject: [gpfsug-discuss] ESS - Considerations when adding NSD space? Message-ID: <4C700BA6-90D1-40B7-BBDA-48645E74D7F7@nuance.com> We recently upgraded our GL4 to a GL6 (trouble free process for those considering FYI). I now have 615T free (raw) in each of my recovery groups. I?d like to increase the size of one of the file systems (currently at 660T, I?d like to add 100T). My first thought was going to be: mmvdisk vdiskset define --vdisk-set fsdata1 --recovery-group rg_gssio1-hs,rg_gssio2-hs --set-size 50T --code 8+2p --block-size 4m --nsd-usage dataOnly --storage-pool data mmvdisk vdiskset create --vdisk-set fs1data1 mmvdisk filesystem add --filesystem fs1 --vdisk-set fs1data1 I know in the past use of mixed size NSDs was frowned upon, not sure on the ESS. The other approach would be add two larger NSDs (current ones are 330T) of 380T, migrate the data to the new ones using mmrestripe, then delete the old ones. The other benefit of this process would be to have the file system data better balanced across all the storage enclosures. Any considerations before I do this? Thoughts? Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Thu Oct 24 16:54:50 2019 From: stockf at us.ibm.com (Frederick Stock) Date: Thu, 24 Oct 2019 15:54:50 +0000 Subject: [gpfsug-discuss] ESS - Considerations when adding NSD space? In-Reply-To: <4C700BA6-90D1-40B7-BBDA-48645E74D7F7@nuance.com> References: <4C700BA6-90D1-40B7-BBDA-48645E74D7F7@nuance.com> Message-ID: An HTML attachment was scrubbed... URL: From A.Wolf-Reber at de.ibm.com Thu Oct 24 20:43:13 2019 From: A.Wolf-Reber at de.ibm.com (Alexander Wolf) Date: Thu, 24 Oct 2019 19:43:13 +0000 Subject: [gpfsug-discuss] ESS - Considerations when adding NSD space? In-Reply-To: References: , <4C700BA6-90D1-40B7-BBDA-48645E74D7F7@nuance.com> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15719308156166.png Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15719308156167.png Type: image/png Size: 6645 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15719308156168.png Type: image/png Size: 1134 bytes Desc: not available URL: From lgayne at us.ibm.com Fri Oct 25 18:54:02 2019 From: lgayne at us.ibm.com (Lyle Gayne) Date: Fri, 25 Oct 2019 17:54:02 +0000 Subject: [gpfsug-discuss] ESS - Considerations when adding NSD space? In-Reply-To: References: , , <4C700BA6-90D1-40B7-BBDA-48645E74D7F7@nuance.com> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15719308156166.png Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15719308156167.png Type: image/png Size: 6645 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15719308156168.png Type: image/png Size: 1134 bytes Desc: not available URL: From olaf.weiser at de.ibm.com Fri Oct 25 18:59:48 2019 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Fri, 25 Oct 2019 19:59:48 +0200 Subject: [gpfsug-discuss] ESS - Considerations when adding NSD space? In-Reply-To: References: , <4C700BA6-90D1-40B7-BBDA-48645E74D7F7@nuance.com> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 6645 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1134 bytes Desc: not available URL: From Robert.Oesterlin at nuance.com Mon Oct 28 14:02:57 2019 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 28 Oct 2019 14:02:57 +0000 Subject: [gpfsug-discuss] Question on CES Authentication - LDAP Message-ID: <15B1F438-38DD-4F2C-89CA-5C4EE8929CFA@nuance.com> This relates to V 5.0.3. If my CES server node has system defined authentication using LDAP, should I expect that setting my authentication setting of ?userdefined? using mmuserauth to work? That doesn?t seem to be the case for me. Is there some other setting I should be using? I tried using LDAP in mmuserauth, and that promptly stomped on my sssd.conf file on that node which broke everything. Any by the way, stores a plain text password in the sssd.conf file just for good measure! Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Mon Oct 28 17:12:08 2019 From: valdis.kletnieks at vt.edu (Valdis Kl=?utf-8?Q?=c4=93?=tnieks) Date: Mon, 28 Oct 2019 13:12:08 -0400 Subject: [gpfsug-discuss] Question on CES Authentication - LDAP In-Reply-To: <15B1F438-38DD-4F2C-89CA-5C4EE8929CFA@nuance.com> References: <15B1F438-38DD-4F2C-89CA-5C4EE8929CFA@nuance.com> Message-ID: <55677.1572282728@turing-police> On Mon, 28 Oct 2019 14:02:57 -0000, "Oesterlin, Robert" said: > Any by the way, stores a plain text password in the sssd.conf file just for > good measure! Note that if you want the system to come up without intervention, at best you can only store an obfuscated password, not a securely encrypted one. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From jonathan.buzzard at strath.ac.uk Tue Oct 29 10:14:57 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 29 Oct 2019 10:14:57 +0000 Subject: [gpfsug-discuss] Question on CES Authentication - LDAP In-Reply-To: <55677.1572282728@turing-police> References: <15B1F438-38DD-4F2C-89CA-5C4EE8929CFA@nuance.com> <55677.1572282728@turing-police> Message-ID: <1d324529a566cdd262a8874e48938002f9c1b4d0.camel@strath.ac.uk> On Mon, 2019-10-28 at 13:12 -0400, Valdis Kl?tnieks wrote: > On Mon, 28 Oct 2019 14:02:57 -0000, "Oesterlin, Robert" said: > > Any by the way, stores a plain text password in the sssd.conf file > > just for good measure! > > Note that if you want the system to come up without intervention, at > best you can only store an obfuscated password, not a securely > encrypted one. > Kerberos and a machine account spring to mind. Crazy given Kerberos is a Unix technology everyone seems to forget about it. Also my understanding is that in theory a TPM module in your server can be used for this https://en.wikipedia.org/wiki/Trusted_Platform_Module Support in Linux is weak at best, but basically it can be used to store passwords and it can be tied to the system. Locality and physical presence being the terminology used. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From linesr at janelia.hhmi.org Thu Oct 31 15:23:59 2019 From: linesr at janelia.hhmi.org (Lines, Robert) Date: Thu, 31 Oct 2019 15:23:59 +0000 Subject: [gpfsug-discuss] Inherited ACLs and multi-protocol access Message-ID: I know I am missing something here and it is probably due to lack of experience dealing with ACLs as all other storage we distil down to just posix UGO permissions. We have Windows native clients creating data. There are SMB clients of various flavors accessing data via CES. Then there are Linux native clients that interface between gpfs and other NFS filers for data movement. What I am running into is around inheriting permissions so that windows native and smb clients have access based on the users group membership that remains sane while also being able to migrate files off to nfs filers with reasonable posix permissions. Here is the top level directory that is the lab name and there is a matching group. That directory is the highest point where an ACL has been set with inheritance. The directory listed is one created from a Windows Native client. The issue I am running into is that that largec7 directory that was created is having the posix permissions set to nothing for the owner. The ACL that results is okay but when that folder or anything in it is synced off to another filer that only has the basic posix permission it acts kinda wonky. The user was able to fix up his files on the other filer because he was still the owner but I would like to make it work properly. [root at gpfs-dm1 smith]# ls -la drwxrwsr-x 84 root smith 16384 Oct 30 23:22 . d---rwsr-x 2 tim smith 4096 Oct 30 23:22 largec7 drwx--S--- 2 tim smith 4096 Oct 24 00:17 CFA1 [root at gpfs-dm1 smith]# mmgetacl . #NFSv4 ACL #owner:root #group:smith special:owner@:rwxc:allow:FileInherit:DirInherit (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED special:group@:rwxc:allow:FileInherit:DirInherit (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED special:everyone@:r-x-:allow:FileInherit:DirInherit (X)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED [root at gpfs-dm1 smith]# mmgetacl largec7 #NFSv4 ACL #owner:tim #group:smith #ACL flags: # DACL_PRESENT # DACL_AUTO_INHERITED # SACL_AUTO_INHERITED user:root:rwxc:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED special:group@:rwxc:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED special:everyone@:r-x-:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED In contrast the CFA1 directory was created prior to the file and directory inheritance being put in place. That worked okay as long as it was only that user but the lack of group access is a problem and what led to trying to sort out the inherited ACLs in the first place. [root at gpfs-dm1 smith]# ls -l drwx--S--- 2 tim smith 4096 Oct 24 00:17 CFA1 [root at gpfs-dm1 smith]# mmgetacl CFA1 #NFSv4 ACL #owner:tim #group:smith #ACL flags: # DACL_PRESENT # DACL_AUTO_INHERITED # SACL_AUTO_INHERITED special:owner@:rwxc:allow (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED user:15000001:rwxc:allow (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED user:15000306:r-x-:allow (X)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED Thank you for any suggestions. -- Rob Lines Sr. HPC Engineer HHMI Janelia Research Campus -------------- next part -------------- An HTML attachment was scrubbed... URL: From andreas.mattsson at maxiv.lu.se Tue Oct 1 07:33:35 2019 From: andreas.mattsson at maxiv.lu.se (Andreas Mattsson) Date: Tue, 1 Oct 2019 06:33:35 +0000 Subject: [gpfsug-discuss] afmRefreshAsync questions In-Reply-To: References: , Message-ID: Hi, I've tried increasing all the refresh intervals, but even at 300 seconds, there is very little performance increase. The job runs in several steps, and gets held up at two places, as far as I can see. First at a kind of parallelisation step where about 1000-3000 files are created in the current working folder on a single compute node, and then at a step where lots of small output files are written on each of the compute nodes involved in the job. Comparing with running the same data set on a non-AFM cache fileset in the same storage system, it runs at least a factor 5 slower, even with really high refresh intervals. In the Scale documentation, it states that the afmRefreshAsync is only configurable cluster wide. Is it also configurable on a per-fileset level? https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.3/com.ibm.spectrum.scale.v5r03.doc/bl1adm_configurationparametersAFM.htm The software is XDS, http://xds.mpimf-heidelberg.mpg.de/ Unfortunately it is a closed source software, so it is not possible to adapt the software. Regards, Andreas Mattsson ____________________________________________ [X] Andreas Mattsson Systems Engineer MAX IV Laboratory Lund University P.O. Box 118, SE-221 00 Lund, Sweden Visiting address: Fotongatan 2, 224 84 Lund Mobile: +46 706 64 95 44 andreas.mattsson at maxiv.lu.se www.maxiv.se ________________________________ Fr?n: gpfsug-discuss-bounces at spectrumscale.org f?r Venkateswara R Puvvada Skickat: den 27 september 2019 10:23:13 Till: gpfsug main discussion list ?mne: Re: [gpfsug-discuss] afmRefreshAsync questions Hi, Both storage and client clusters have to be on 5.0.3.x to get the AFM revalidation performance with afmRefreshAsync. What are the refresh intervals ?, you could also try increasing them. Is this config option set at fileset level or cluster level ? ~Venkat (vpuvvada at in.ibm.com) From: Andreas Mattsson To: GPFS User Group Date: 09/26/2019 03:26 PM Subject: [EXTERNAL] [gpfsug-discuss] afmRefreshAsync questions Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi, Due to having a data analysis software that isn't running well at all in our AFM caches, it runs 4-6 times slower on an AFM cache than on a non-AFM fileset on the same storage system, I wanted to try out the afmRefreshAsync feature that came with 5.0.3 to see if it is the cache data refresh that is holding things up. Enabling this feature has had zero impact on performance of the software though. The storage cluster is running 5.0.3.x, and afmRefreshAsync has been set there, but at the moment the remote-mounting client cluster is still running 5.0.2.x. Would this feature still have any effect in this setup? Regards, Andreas Mattsson ____________________________________________ [cid:_4_DB7D1BA8DB7D1920002E115D65258482] Andreas Mattsson Systems Engineer MAX IV Laboratory Lund University P.O. Box 118, SE-221 00 Lund, Sweden Visiting address: Fotongatan 2, 224 84 Lund Mobile: +46 706 64 95 44 andreas.mattsson at maxiv.lu.se www.maxiv.se _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00001.png Type: image/png Size: 4232 bytes Desc: ATT00001.png URL: From leonardo.sala at psi.ch Tue Oct 1 08:03:05 2019 From: leonardo.sala at psi.ch (Leonardo Sala) Date: Tue, 1 Oct 2019 09:03:05 +0200 Subject: [gpfsug-discuss] Filesystem access issues via CES NFS In-Reply-To: References: <717f49aade0b439eb1b99fc620a21cac@maxiv.lu.se> <9456645b0a1f4b488b13874ea672b9b8@maxiv.lu.se> Message-ID: Dear all, we have similar issues on our CES cluster, and we do have 5.0.2-1. Could anybody from IBM confirm that with 5.0.2-2 this issue should not be there anymore? Should we go for 5.0.2-2 or is there a better release? One thing we noticed: when we had the "empty ls" issue, which means: - on CES NFSv3 export, a directory is wrongly reported as empty, while - on kernel NFS export, this does not happen if I do an ls on that directory on the CES export node, then magically the empty dir issue disappears from all NFS clients, at least the ones attached on that node. Is this compatible with the behaviour described on the other sites? thanks! cheers leo Paul Scherrer Institut Dr. Leonardo Sala Group Leader High Performance Computing Deputy Section Head Science IT Science IT WHGA/106 5232 Villigen PSI Switzerland Phone: +41 56 310 3369 leonardo.sala at psi.ch www.psi.ch On 04.01.19 10:09, Andreas Mattsson wrote: > > Just reporting back that the issue we had seems to have been solved. > In our case it was fixed by applying hotfix-packages from IBM. Did > this in December and I can no longer trigger the issue. Hopefully, > it'll stay fixed when we get full production load on the system again > now in January. > > Also, as far as I can see, it looks like Scale 5.0.2.2 includes these > packages already. > > > Regards, > > Andreas mattsson > -------------- next part -------------- An HTML attachment was scrubbed... URL: From heinrich.billich at id.ethz.ch Tue Oct 1 13:34:38 2019 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Tue, 1 Oct 2019 12:34:38 +0000 Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? In-Reply-To: <766AA5C3-46BD-4B91-9D1E-52BC5FAB90A8@id.ethz.ch> References: <766AA5C3-46BD-4B91-9D1E-52BC5FAB90A8@id.ethz.ch> Message-ID: Hello, I wanted to completely disable IPv6 to get ganesha to use IPv4 sockets only. Once we did set the sysctl configs to disable IPv6 *and* did rebuild the initramfs.*.img file to include the new settings IPv6 was completely gone and ganesha did open an IPv4 socket only. We missed to rebuild the initramfs.*.img file in the first trial. Rpcbind/ganesha failed to start without the initramfs rebuild. Cheers, Heiner Some related documents from netapp https://access.redhat.com/solutions/8709#?rhel7disable https://access.redhat.com/solutions/2798411 https://access.redhat.com/solutions/2963091 From: on behalf of "Billich Heinrich Rainer (ID SD)" Reply to: gpfsug main discussion list Date: Monday, 16 September 2019 at 17:51 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? Hello Olaf, Thank you, so we?ll try to get rid of IPv6. Actually we do have this settings active but I may have to add them to the initrd file, too. (See https://access.redhat.com/solutions/8709#?rhel7disable) to prevent ganesha from opening an IPv6 socket. It?s probably no big issue if ganesha uses IPv4overIPv6 for all connections, but to keep things simple I would like to avoid it. @Edward We got /etc/tuned/scale/tuned.conf with GSS/xCAT. I?m not sure whether it?s part of any rpm. Cheers, Heiner From: on behalf of Olaf Weiser Reply to: gpfsug main discussion list Date: Monday, 16 September 2019 at 09:12 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? Hallo Heiner, usually, Spectrum Scale comes with a tuned profile (named scale) .. [root at nsd01 ~]# tuned-adm active Current active profile: scale in there [root at nsd01 ~]# cat /etc/tuned/scale/tuned.conf | tail -3 # Disable IPv6 net.ipv6.conf.all.disable_ipv6=1 net.ipv6.conf.default.disable_ipv6=1 [root at nsd01 ~]# depending on .... what you need to achieve .. one might be forced to changed that.. e.g. for RoCE .. you need IPv6 to be active ... but for all other scenarios with SpectrumScale (at least what I'm aware of right now) ... IPv6 can be disabled... From: "Billich Heinrich Rainer (ID SD)" To: gpfsug main discussion list Date: 09/13/2019 05:02 PM Subject: [EXTERNAL] [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello, I just noted that our ganesha daemons offer IPv6 sockets only, IPv4 traffic gets encapsulated. But all traffic to samba is IPv4, smbd offers both IPv4 and IPv6 sockets. I just wonder whether this is to be expected? Protocols support IPv4 only, so why running on IPv6 sockets only for ganesha? Did we configure something wrong and should completely disable IPv6 on the kernel level Any comment is welcome Cheers, Heiner -- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== I did check with ss -l -t -4 ss -l -t -6 add -p to get the process name, too. do you get the same results on your ces nodes? [root at nas22ces04-i config_samples]# ss -l -t -4 State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 8192 *:gpfs *:* LISTEN 0 50 *:netbios-ssn *:* LISTEN 0 128 *:5355 *:* LISTEN 0 128 *:sunrpc *:* LISTEN 0 128 *:ssh *:* LISTEN 0 100 127.0.0.1:smtp *:* LISTEN 0 10 10.250.135.24:4379 *:* LISTEN 0 128 *:32765 *:* LISTEN 0 50 *:microsoft-ds *:* [root at nas22ces04-i config_samples]# ss -l -t -6 State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 128 :::32767 :::* LISTEN 0 128 :::32768 :::* LISTEN 0 128 :::32769 :::* LISTEN 0 128 :::2049 :::* LISTEN 0 128 :::5355 :::* LISTEN 0 50 :::netbios-ssn :::* LISTEN 0 128 :::sunrpc :::* LISTEN 0 128 :::ssh :::* LISTEN 0 128 :::32765 :::* LISTEN 0 50 :::microsoft-ds :::* _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Oct 1 16:15:00 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 1 Oct 2019 15:15:00 +0000 Subject: [gpfsug-discuss] verbsPortsOutOfOrder Message-ID: <139F1B36-A1EE-4D3C-A50A-1F15D8BCD242@bham.ac.uk> Hi, In mmdiag --config, we see ?verbsPortsOutOfOrder? as an unset option. Could anyone comment on what that might do and if it relates to the ordering that ?verbsPorts? are set? Thanks Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From TOMP at il.ibm.com Wed Oct 2 11:53:59 2019 From: TOMP at il.ibm.com (Tomer Perry) Date: Wed, 2 Oct 2019 13:53:59 +0300 Subject: [gpfsug-discuss] verbsPortsOutOfOrder In-Reply-To: <139F1B36-A1EE-4D3C-A50A-1F15D8BCD242@bham.ac.uk> References: <139F1B36-A1EE-4D3C-A50A-1F15D8BCD242@bham.ac.uk> Message-ID: Simon, It looks like its setting the Out Of Order MLX5 environmental parameter: https://docs.mellanox.com/display/MLNXOFEDv451010/Out-of-Order+%28OOO%29+Data+Placement+Experimental+Verbs Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 01/10/2019 18:17 Subject: [EXTERNAL] [gpfsug-discuss] verbsPortsOutOfOrder Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, In mmdiag --config, we see ?verbsPortsOutOfOrder? as an unset option. Could anyone comment on what that might do and if it relates to the ordering that ?verbsPorts? are set? Thanks Simon_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=mLPyKeOa1gNDrORvEXBgMw&m=dwtQhITjaULogq0l7wR3LfWDiy4R6tpPWq81EvnuA_o&s=LyZT2j0hkAP9pJTkYU40ZkexzkG6RFRqDcS9rSrapRc&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Wed Oct 2 18:02:06 2019 From: david_johnson at brown.edu (David Johnson) Date: Wed, 2 Oct 2019 13:02:06 -0400 Subject: [gpfsug-discuss] CIFS protocol access does not honor secondary groups Message-ID: After converting from clustered CIFS to CES protocols, we?ve noticed that SMB users can?t access files owned by groups that they are members of, unless that group happens to be their primary group. Have read the smb.conf man page, and don?t see anything obvious that would control this? What might we be missing? Thanks, ? ddj Dave Johnson Brown University CCV/CIS From frederik.ferner at diamond.ac.uk Wed Oct 2 19:41:14 2019 From: frederik.ferner at diamond.ac.uk (Frederik Ferner) Date: Wed, 2 Oct 2019 19:41:14 +0100 Subject: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is this unusual? In-Reply-To: <9D53BE88-A5FC-469F-9362-F2EC67E393B7@id.ethz.ch> References: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch> <9D53BE88-A5FC-469F-9362-F2EC67E393B7@id.ethz.ch> Message-ID: <0a5f042f-2715-c436-34a1-27c0ba529a70@diamond.ac.uk> Hello Heiner, very interesting, thanks. In our case we are seeing this problem on gpfs.nfs-ganesha-gpfs-2.5.3-ibm036.05.el7, so close to the version where you're seeing it. Frederik On 23/09/2019 10:33, Billich Heinrich Rainer (ID SD) wrote: > Hello Frederik, > > Thank you. I now see a similar behavior: Ganesha has 500k open files while the node is suspended since 2+hours. I would expect that some cleanup job does remove most of the open FD after a much shorter while. Our systems have an upper limit of 1M open files per process and these spectrum scale settings: > > ! maxFilesToCache 1048576 > ! maxStatCache 2097152 > > Our ganesha version is 2.5.3. (gpfs.nfs-ganesha-2.5.3-ibm036.10.el7). I don't see the issue with gpfs.nfs-ganesha-2.5.3-ibm030.01.el7. But this second cluster also has a different load pattern. > > I did also post my initial question to the ganesha mailing list and want to share the reply I've got from Daniel Gryniewicz. > > Cheers, > Heiner > > Daniel Gryniewicz > So, it's not impossible, based on the workload, but it may also be a bug. > > For global FDs (All NFSv3 and stateless NFSv4), we obviously cannot know > when the client closes the FD, and opening/closing all the time causes a > large performance hit. So, we cache open FDs. > > All handles in MDCACHE live on the LRU. This LRU is divided into 2 > levels. Level 1 is more active handles, and they can have open FDs. > Various operation can demote a handle to level 2 of the LRU. As part of > this transition, the global FD on that handle is closed. Handles that > are actively in use (have a refcount taken on them) are not eligible for > this transition, as the FD may be being used. > > We have a background thread that runs, and periodically does this > demotion, closing the FDs. This thread runs more often when the number > of open FDs is above FD_HwMark_Percent of the available number of FDs, > and runs constantly when the open FD count is above FD_Limit_Percent of > the available number of FDs. > > So, a heavily used server could definitely have large numbers of FDs > open. However, there have also, in the past, been bugs that would > either keep the FDs from being closed, or would break the accounting (so > they were closed, but Ganesha still thought they were open). You didn't > say what version of Ganesha you're using, so I can't tell if one of > those bugs apply. > > Daniel > > ?On 19.09.19, 16:37, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Frederik Ferner" wrote: > > Heiner, > > we are seeing similar issues with CES/ganesha NFS, in our case it > exclusively with NFSv3 clients. > > What is maxFilesToCache set to on your ganesha node(s)? In our case > ganesha was running into the limit of open file descriptors because > maxFilesToCache was set at a low default and for now we've increased it > to 1M. > > It seemed that ganesha was never releasing files even after clients > unmounted the file system. > > We've only recently made the change, so we'll see how much that improved > the situation. > > I thought we had a reproducer but after our recent change, I can now no > longer successfully reproduce the increase in open files not being released. > > Kind regards, > Frederik > > On 19/09/2019 15:20, Billich Heinrich Rainer (ID SD) wrote: > > Hello, > > > > Is it usual to see 200?000-400?000 open files for a single ganesha > > process? Or does this indicate that something ist wrong? > > > > We have some issues with ganesha (on spectrum scale protocol nodes) > > reporting NFS3ERR_IO in the log. I noticed that the affected nodes > > have a large number of open files, 200?000-400?000 open files per daemon > > (and 500 threads and about 250 client connections). Other nodes have > > 1?000 ? 10?000 open files by ganesha only and don?t show the issue. > > > > If someone could explain how ganesha decides which files to keep open > > and which to close that would help, too. As NFSv3 is stateless the > > client doesn?t open/close a file, it?s the server to decide when to > > close it? We do have a few NFSv4 clients, too. > > > > Are there certain access patterns that can trigger such a large number > > of open file? Maybe traversing and reading a large number of small files? > > > > Thank you, > > > > Heiner > > > > I did count the open files by counting the entries in /proc/ > ganesha>/fd/ . With several 100k entries I failed to do a ?ls -ls? to > > list all the symbolic links, hence I can?t relate the open files to > > different exports easily. > > > > I did post this to the ganesha mailing list, too. > > > > -- > > > > ======================= > > > > Heinrich Billich > > > > ETH Z?rich > > > > Informatikdienste > > > > Tel.: +41 44 632 72 56 > > > > heinrich.billich at id.ethz.ch > > > > ======================== > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > -- > This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. > Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. > Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. > Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Frederik Ferner Senior Computer Systems Administrator (storage) phone: +44 1235 77 8624 Diamond Light Source Ltd. mob: +44 7917 08 5110 Duty Sys Admin can be reached on x8596 (Apologies in advance for the lines below. Some bits are a legal requirement and I have no control over them.) From kkr at lbl.gov Thu Oct 3 01:01:39 2019 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Wed, 2 Oct 2019 17:01:39 -0700 Subject: [gpfsug-discuss] Slides from last US event, hosting and speaking at events and plan for next events Message-ID: Hi all, The slides from the UG event at NERSC/LBNL are making there way here: https://www.spectrumscaleug.org/presentations/ Most of them are already in place. Thanks to all who attended, presented and participated. It?s great when we have interactive discussions at these events. We?d like to ask you, as GPFS/Spectrum Scale users, to consider hosting a future UG event at your site or giving a site update. I?ve been asked *many times*, why aren?t there more site updates? So you tell me?is there a barrier that I?m not aware of? We?re a friendly group (really!) and want to hear about your successes and your failures. We all learn from each other. Let me know if you have any thoughts about this. As a reminder, there is an upcoming Australian event and 2 upcoming US events Australia ? Sydney October 18th https://www.spectrumscaleug.org/event/spectrum-scale-user-group-at-ibm-systems-technical-university-australia/ US ? NYC October 10th https://www.spectrumscaleug.org/event/spectrum-scale-nyc-user-meeting-2019/ ? SC19 at Denver November 17th - This year we will include a morning session for new users and lunch. Online agenda will be available soon. https://www.spectrumscaleug.org/event/spectrum-scale-user-group-meeting-sc19/ Any feedback for the agendas for these events, or in general, please let us know. Cheers, Kristy -------------- next part -------------- An HTML attachment was scrubbed... URL: From ruben.cremades at roche.com Thu Oct 3 08:18:03 2019 From: ruben.cremades at roche.com (Cremades, Ruben) Date: Thu, 3 Oct 2019 09:18:03 +0200 Subject: [gpfsug-discuss] verbsPortsOutOfOrder In-Reply-To: References: <139F1B36-A1EE-4D3C-A50A-1F15D8BCD242@bham.ac.uk> Message-ID: Thanks Tomer, I have opened TS002806998 Regards Ruben On Wed, Oct 2, 2019 at 12:54 PM Tomer Perry wrote: > Simon, > > It looks like its setting the Out Of Order MLX5 environmental parameter: > > *https://docs.mellanox.com/display/MLNXOFEDv451010/Out-of-Order+%28OOO%29+Data+Placement+Experimental+Verbs* > > > > > Regards, > > Tomer Perry > Scalable I/O Development (Spectrum Scale) > email: tomp at il.ibm.com > 1 Azrieli Center, Tel Aviv 67021, Israel > Global Tel: +1 720 3422758 > Israel Tel: +972 3 9188625 > Mobile: +972 52 2554625 > > > > > From: Simon Thompson > To: "gpfsug-discuss at spectrumscale.org" < > gpfsug-discuss at spectrumscale.org> > Date: 01/10/2019 18:17 > Subject: [EXTERNAL] [gpfsug-discuss] verbsPortsOutOfOrder > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi, > > In mmdiag --config, we see ?verbsPortsOutOfOrder? as an unset option. > Could anyone comment on what that might do and if it relates to the > ordering that ?verbsPorts? are set? > > Thanks > > Simon_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Rub?n Cremades Science Infrastructure F.Hoffmann-La Roche Ltd. Bldg 254 / Room 04 - NBH01 Wurmisweg 4303 - Kaiseraugst Phone: +41-61-687 26 25 ruben.cremades at roche.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Oct 3 10:14:01 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 3 Oct 2019 09:14:01 +0000 Subject: [gpfsug-discuss] verbsPortsOutOfOrder In-Reply-To: References: <139F1B36-A1EE-4D3C-A50A-1F15D8BCD242@bham.ac.uk> Message-ID: Thanks Tomer. That makes sense, also not something I think we need to worry about ? I assume that relates to hypercube or dragonfly or some such though the Mellanox docs only say ?some topologies? Simon From: on behalf of "TOMP at il.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Wednesday, 2 October 2019 at 11:54 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] verbsPortsOutOfOrder Simon, It looks like its setting the Out Of Order MLX5 environmental parameter: https://docs.mellanox.com/display/MLNXOFEDv451010/Out-of-Order+%28OOO%29+Data+Placement+Experimental+Verbs Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 01/10/2019 18:17 Subject: [EXTERNAL] [gpfsug-discuss] verbsPortsOutOfOrder Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi, In mmdiag --config, we see ?verbsPortsOutOfOrder? as an unset option. Could anyone comment on what that might do and if it relates to the ordering that ?verbsPorts? are set? Thanks Simon_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Oct 3 10:17:15 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 3 Oct 2019 09:17:15 +0000 Subject: [gpfsug-discuss] CIFS protocol access does not honor secondary groups In-Reply-To: References: Message-ID: This works for us, so it's something that should work. It's probably related to the way your authentication is setup, we used to use custom from before IBM supporting AD+LDAP and we had to add entries for the group SID in the LDAP server also, but since moving to "supported" way of doing this, we don't think we need this anymore.. You might want to do some digging with the wbinfo command and see if groups/SIDs resolve both ways, but I'd suggest opening a PMR on this. You could also check what file-permissions look like with mmgetacl. In the past we've seen some funkiness where creator/owner isn't on/inherited, so if the user owns the file/directory but the permission is to the group rather than directly the user, they can create new files but then not read them afterwards (though other users in the group can). I forget the exact details as we worked a standard inheritable ACL that works for us __ Simon ?On 02/10/2019, 18:02, "gpfsug-discuss-bounces at spectrumscale.org on behalf of David Johnson" wrote: After converting from clustered CIFS to CES protocols, we?ve noticed that SMB users can?t access files owned by groups that they are members of, unless that group happens to be their primary group. Have read the smb.conf man page, and don?t see anything obvious that would control this? What might we be missing? Thanks, ? ddj Dave Johnson Brown University CCV/CIS _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From TOMP at il.ibm.com Thu Oct 3 10:44:32 2019 From: TOMP at il.ibm.com (Tomer Perry) Date: Thu, 3 Oct 2019 12:44:32 +0300 Subject: [gpfsug-discuss] verbsPortsOutOfOrder In-Reply-To: References: <139F1B36-A1EE-4D3C-A50A-1F15D8BCD242@bham.ac.uk> Message-ID: Simon, I believe that adaptive routing might also introduce out of order packets - but I would ask Mellanox as to when they recommend to use it. Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Simon Thompson To: gpfsug main discussion list Date: 03/10/2019 12:14 Subject: [EXTERNAL] Re: [gpfsug-discuss] verbsPortsOutOfOrder Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Tomer. That makes sense, also not something I think we need to worry about ? I assume that relates to hypercube or dragonfly or some such though the Mellanox docs only say ?some topologies? Simon From: on behalf of "TOMP at il.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Wednesday, 2 October 2019 at 11:54 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] verbsPortsOutOfOrder Simon, It looks like its setting the Out Of Order MLX5 environmental parameter: https://docs.mellanox.com/display/MLNXOFEDv451010/Out-of-Order+%28OOO%29+Data+Placement+Experimental+Verbs Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 01/10/2019 18:17 Subject: [EXTERNAL] [gpfsug-discuss] verbsPortsOutOfOrder Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, In mmdiag --config, we see ?verbsPortsOutOfOrder? as an unset option. Could anyone comment on what that might do and if it relates to the ordering that ?verbsPorts? are set? Thanks Simon_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=mLPyKeOa1gNDrORvEXBgMw&m=rn4emIykuWgljnk6nj_Ay8TFU177BWp8qeaVAjmenfM&s=dO3QHcwm0oVHnHKGtdwIi2Q8mXWvL6JPmU7aVuRRMx0&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From andreas.mattsson at maxiv.lu.se Thu Oct 3 14:55:19 2019 From: andreas.mattsson at maxiv.lu.se (Andreas Mattsson) Date: Thu, 3 Oct 2019 13:55:19 +0000 Subject: [gpfsug-discuss] afmRefreshAsync questions In-Reply-To: References: , , Message-ID: <7476c598c32440f1bffe7d9e950c0965@maxiv.lu.se> After further investigaion, it seems like this XDS software is using memory mapped io when operating on the files. Is it possible that MMAP IO has a higher performance hit by AFM than regular file access? /Andreas ____________________________________________ [X] Andreas Mattsson Systems Engineer MAX IV Laboratory Lund University P.O. Box 118, SE-221 00 Lund, Sweden Visiting address: Fotongatan 2, 224 84 Lund Mobile: +46 706 64 95 44 andreas.mattsson at maxiv.lu.se www.maxiv.se ________________________________ Fr?n: gpfsug-discuss-bounces at spectrumscale.org f?r Andreas Mattsson Skickat: den 1 oktober 2019 08:33:35 Till: gpfsug main discussion list ?mne: Re: [gpfsug-discuss] afmRefreshAsync questions Hi, I've tried increasing all the refresh intervals, but even at 300 seconds, there is very little performance increase. The job runs in several steps, and gets held up at two places, as far as I can see. First at a kind of parallelisation step where about 1000-3000 files are created in the current working folder on a single compute node, and then at a step where lots of small output files are written on each of the compute nodes involved in the job. Comparing with running the same data set on a non-AFM cache fileset in the same storage system, it runs at least a factor 5 slower, even with really high refresh intervals. In the Scale documentation, it states that the afmRefreshAsync is only configurable cluster wide. Is it also configurable on a per-fileset level? https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.3/com.ibm.spectrum.scale.v5r03.doc/bl1adm_configurationparametersAFM.htm The software is XDS, http://xds.mpimf-heidelberg.mpg.de/ Unfortunately it is a closed source software, so it is not possible to adapt the software. Regards, Andreas Mattsson ____________________________________________ [X] Andreas Mattsson Systems Engineer MAX IV Laboratory Lund University P.O. Box 118, SE-221 00 Lund, Sweden Visiting address: Fotongatan 2, 224 84 Lund Mobile: +46 706 64 95 44 andreas.mattsson at maxiv.lu.se www.maxiv.se ________________________________ Fr?n: gpfsug-discuss-bounces at spectrumscale.org f?r Venkateswara R Puvvada Skickat: den 27 september 2019 10:23:13 Till: gpfsug main discussion list ?mne: Re: [gpfsug-discuss] afmRefreshAsync questions Hi, Both storage and client clusters have to be on 5.0.3.x to get the AFM revalidation performance with afmRefreshAsync. What are the refresh intervals ?, you could also try increasing them. Is this config option set at fileset level or cluster level ? ~Venkat (vpuvvada at in.ibm.com) From: Andreas Mattsson To: GPFS User Group Date: 09/26/2019 03:26 PM Subject: [EXTERNAL] [gpfsug-discuss] afmRefreshAsync questions Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi, Due to having a data analysis software that isn't running well at all in our AFM caches, it runs 4-6 times slower on an AFM cache than on a non-AFM fileset on the same storage system, I wanted to try out the afmRefreshAsync feature that came with 5.0.3 to see if it is the cache data refresh that is holding things up. Enabling this feature has had zero impact on performance of the software though. The storage cluster is running 5.0.3.x, and afmRefreshAsync has been set there, but at the moment the remote-mounting client cluster is still running 5.0.2.x. Would this feature still have any effect in this setup? Regards, Andreas Mattsson ____________________________________________ [cid:_4_DB7D1BA8DB7D1920002E115D65258482] Andreas Mattsson Systems Engineer MAX IV Laboratory Lund University P.O. Box 118, SE-221 00 Lund, Sweden Visiting address: Fotongatan 2, 224 84 Lund Mobile: +46 706 64 95 44 andreas.mattsson at maxiv.lu.se www.maxiv.se _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00001.png Type: image/png Size: 4232 bytes Desc: ATT00001.png URL: From christof.schmitt at us.ibm.com Thu Oct 3 17:02:17 2019 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Thu, 3 Oct 2019 16:02:17 +0000 Subject: [gpfsug-discuss] =?utf-8?q?CIFS_protocol_access_does_not_honor_se?= =?utf-8?q?condary=09groups?= In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From mnaineni at in.ibm.com Thu Oct 3 18:15:04 2019 From: mnaineni at in.ibm.com (Malahal R Naineni) Date: Thu, 3 Oct 2019 17:15:04 +0000 Subject: [gpfsug-discuss] Filesystem access issues via CES NFS In-Reply-To: References: , <717f49aade0b439eb1b99fc620a21cac@maxiv.lu.se><9456645b0a1f4b488b13874ea672b9b8@maxiv.lu.se> Message-ID: An HTML attachment was scrubbed... URL: From mnaineni at in.ibm.com Thu Oct 3 18:31:34 2019 From: mnaineni at in.ibm.com (Malahal R Naineni) Date: Thu, 3 Oct 2019 17:31:34 +0000 Subject: [gpfsug-discuss] Filesystem access issues via CES NFS In-Reply-To: References: , <717f49aade0b439eb1b99fc620a21cac@maxiv.lu.se><9456645b0a1f4b488b13874ea672b9b8@maxiv.lu.se> Message-ID: An HTML attachment was scrubbed... URL: From will.schmied at stjude.org Thu Oct 3 19:59:22 2019 From: will.schmied at stjude.org (Schmied, Will) Date: Thu, 3 Oct 2019 18:59:22 +0000 Subject: [gpfsug-discuss] Job: HPC Storage Architect at St. Jude Message-ID: <277C9DAD-06A2-4BD9-906F-83BFDDCDD965@stjude.org> Happy almost Friday everyone, St. Jude Children?s Research Hospital (Memphis, TN) has recently posted a job opening for a HPC Storage Architect, a senior level position working primarily to operate and maintain multiple Spectrum Scale clusters in support of research and other HPC workloads. You can view the job posting, and begin your application, here: http://myjob.io/nd6qd You can find all jobs, and information about working at St. Jude, here: https://www.stjude.org/jobs/hospital.html Please feel free to contact me directly off list if you have any questions. I?ll also be at SC this year and hope to see you there. Thanks, Will ________________________________ Email Disclaimer: www.stjude.org/emaildisclaimer Consultation Disclaimer: www.stjude.org/consultationdisclaimer -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaineni at in.ibm.com Fri Oct 4 06:49:35 2019 From: mnaineni at in.ibm.com (Malahal R Naineni) Date: Fri, 4 Oct 2019 05:49:35 +0000 Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? In-Reply-To: References: , <766AA5C3-46BD-4B91-9D1E-52BC5FAB90A8@id.ethz.ch> Message-ID: An HTML attachment was scrubbed... URL: From leonardo.sala at psi.ch Fri Oct 4 07:32:42 2019 From: leonardo.sala at psi.ch (Leonardo Sala) Date: Fri, 4 Oct 2019 08:32:42 +0200 Subject: [gpfsug-discuss] Filesystem access issues via CES NFS In-Reply-To: References: <717f49aade0b439eb1b99fc620a21cac@maxiv.lu.se> <9456645b0a1f4b488b13874ea672b9b8@maxiv.lu.se> Message-ID: <08c2be8b-ece1-a7a4-606f-e63fe854a8b1@psi.ch> Dear Malahal, thanks for the answer. Concerning SSSD, we are also using it, should we use 5.0.2-PTF3? We would like to avoid using 5.0.2.2, as it has issues with recent RHEL 7.6 kernels [*] and we are impacted: do you suggest to use 5.0.3.3? cheers leo [*] https://www.ibm.com/support/pages/ibm-spectrum-scale-gpfs-releases-42313-or-later-and-5022-or-later-have-issues-where-kernel-crashes-rhel76-0 Paul Scherrer Institut Dr. Leonardo Sala Group Leader High Performance Computing Deputy Section Head Science IT Science IT WHGA/106 5232 Villigen PSI Switzerland Phone: +41 56 310 3369 leonardo.sala at psi.ch www.psi.ch On 03.10.19 19:15, Malahal R Naineni wrote: > >> @Malahal: Looks like you have written the netgroup caching code, > feel free to ask for further details if required. > Hi Ulrich, Ganesha uses innetgr() call for netgroup information and > sssd has too many issues in its implementation. Redhat said that they > are going to fix sssd synchronization issues in RHEL8. It is in my > plate to serialize innergr() call in Ganesha to match kernel NFS > server usage! I expect the sssd issue to give EACCESS/EPERM kind of > issue but not EINVAL though. > If you are using sssd, you must be getting into a sssd issue. > Ganesha?has a host-ip cache fix in 5.0.2 PTF3. Please make sure you > use ganesha version?V2.5.3-ibm030.01 if you are using netgroups > (shipped with 5.0.2 PTF3 but can be used with Scale 5.0.1 or later) > Regards, Malahal. > > ----- Original message ----- > From: Ulrich Sibiller > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug-discuss at spectrumscale.org > Cc: > Subject: Re: [gpfsug-discuss] Filesystem access issues via CES NFS > Date: Thu, Dec 13, 2018 7:32 PM > On 23.11.2018 14:41, Andreas Mattsson wrote: > > Yes, this is repeating. > > > > We?ve ascertained that it has nothing to do at all with file > operations on the GPFS side. > > > > Randomly throughout the filesystem mounted via NFS, ls or file > access will give > > > > ? > > > > ?> ls: reading directory /gpfs/filessystem/test/testdir: Invalid > argument > > > > ? > > > > Trying again later might work on that folder, but might fail > somewhere else. > > > > We have tried exporting the same filesystem via a standard > kernel NFS instead of the CES > > Ganesha-NFS, and then the problem doesn?t exist. > > > > So it is definitely related to the Ganesha NFS server, or its > interaction with the file system. > > ?> Will see if I can get a tcpdump of the issue. > > We see this, too. We cannot trigger it. Fortunately I have managed > to capture some logs with > debugging enabled. I have now dug into the ganesha 2.5.3 code and > I think the netgroup caching is > the culprit. > > Here some FULL_DEBUG output: > 2018-12-13 11:53:41 : epoch 0009008d : server1 : > gpfs.ganesha.nfsd-258762[work-250] > export_check_access :EXPORT :M_DBG :Check for address 1.2.3.4 for > export id 1 path /gpfsexport > 2018-12-13 11:53:41 : epoch 0009008d : server1 : > gpfs.ganesha.nfsd-258762[work-250] client_match > :EXPORT :M_DBG :Match V4: 0xcf7fe0 NETGROUP_CLIENT: netgroup1 > (options=421021e2root_squash ? , RWrw, > 3--, ---, TCP, ----, Manage_Gids ? , -- Deleg, anon_uid= ?-2, > anon_gid= ? ?-2, sys) > 2018-12-13 11:53:41 : epoch 0009008d : server1 : > gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get > :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain > 2018-12-13 11:53:41 : epoch 0009008d : server1 : > gpfs.ganesha.nfsd-258762[work-250] client_match > :EXPORT :M_DBG :Match V4: 0xcfe320 NETGROUP_CLIENT: netgroup2 > (options=421021e2root_squash ? , RWrw, > 3--, ---, TCP, ----, Manage_Gids ? , -- Deleg, anon_uid= ?-2, > anon_gid= ? ?-2, sys) > 2018-12-13 11:53:41 : epoch 0009008d : server1 : > gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get > :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain > 2018-12-13 11:53:41 : epoch 0009008d : server1 : > gpfs.ganesha.nfsd-258762[work-250] client_match > :EXPORT :M_DBG :Match V4: 0xcfe380 NETGROUP_CLIENT: netgroup3 > (options=421021e2root_squash ? , RWrw, > 3--, ---, TCP, ----, Manage_Gids ? , -- Deleg, anon_uid= ?-2, > anon_gid= ? ?-2, sys) > 2018-12-13 11:53:41 : epoch 0009008d : server1 : > gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get > :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain > 2018-12-13 11:53:41 : epoch 0009008d : server1 : > gpfs.ganesha.nfsd-258762[work-250] > export_check_access :EXPORT :M_DBG :EXPORT ?(options=03303002 ? ? > ? ? ? ? ?, ? ? , ? ?, > ?? ? ?, ? ? ? ? ? ? ? , -- Deleg, ? ? ? ? ? ? ? ?, ? ? ? ?) > 2018-12-13 11:53:41 : epoch 0009008d : server1 : > gpfs.ganesha.nfsd-258762[work-250] > export_check_access :EXPORT :M_DBG :EXPORT_DEFAULTS > (options=42102002root_squash ? , ----, 3--, ---, > TCP, ----, Manage_Gids ? , ? ? ? ? , anon_uid= ? ?-2, anon_gid= ? > ?-2, sys) > 2018-12-13 11:53:41 : epoch 0009008d : server1 : > gpfs.ganesha.nfsd-258762[work-250] > export_check_access :EXPORT :M_DBG :default options > (options=03303002root_squash ? , ----, 34-, UDP, > TCP, ----, No Manage_Gids, -- Deleg, anon_uid= ? ?-2, anon_gid= ? > ?-2, none, sys) > 2018-12-13 11:53:41 : epoch 0009008d : server1 : > gpfs.ganesha.nfsd-258762[work-250] > export_check_access :EXPORT :M_DBG :Final options > (options=42102002root_squash ? , ----, 3--, ---, > TCP, ----, Manage_Gids ? , -- Deleg, anon_uid= ? ?-2, anon_gid= ? > ?-2, sys) > 2018-12-13 11:53:41 : epoch 0009008d : server1 : > gpfs.ganesha.nfsd-258762[work-250] nfs_rpc_execute > :DISP :INFO :DISP: INFO: Client ::ffff:1.2.3.4 is not allowed to > access Export_Id 1 /gpfsexport, > vers=3, proc=18 > > The client "client1" is definitely a member of the "netgroup1". > But the NETGROUP_CLIENT lookups for > "netgroup2" and "netgroup3" can only happen if the netgroup > caching code reports that "client1" is > NOT a member of "netgroup1". > > I have also opened a support case at IBM for this. > > @Malahal: Looks like you have written the netgroup caching code, > feel free to ask for further > details if required. > > Kind regards, > > Ulrich Sibiller > > -- > Dipl.-Inf. Ulrich Sibiller ? ? ? ? ? science + computing ag > System Administration ? ? ? ? ? ? ? ? ? ?Hagellocher Weg 73 > ?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 72070 Tuebingen, Germany > https://atos.net/de/deutschland/sc > -- > Science + Computing AG > Vorstandsvorsitzender/Chairman of the board of management: > Dr. Martin Matzke > Vorstand/Board of Management: > Matthias Schempp, Sabine Hohenstein > Vorsitzender des Aufsichtsrats/ > Chairman of the Supervisory Board: > Philippe Miltin > Aufsichtsrat/Supervisory Board: > Martin Wibbe, Ursula Morgenstern > Sitz/Registered Office: Tuebingen > Registergericht/Registration Court: Stuttgart > Registernummer/Commercial Register No.: HRB 382196 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.schlipalius at pawsey.org.au Fri Oct 4 07:37:17 2019 From: chris.schlipalius at pawsey.org.au (Chris Schlipalius) Date: Fri, 04 Oct 2019 14:37:17 +0800 Subject: [gpfsug-discuss] 2019 October 18th Australian Spectrum Scale User Group event - last call for user case speakers Message-ID: Hello all, This is the final announcement for the Spectrum Scale Usergroup Sydney Australia on Friday the 18th October 2019. All current Australian Spectrum Scale User Group event details can be found here: http://bit.ly/2YOFQ3u Last call for user case speakers please ? let me know if you are available to speak at this Usergroup. Feel free to circulate this event link to all who may need it. Please reserve your tickets now as tickets for places will close soon. There are some great speakers and topics, for details please see the agenda on Eventbrite. This is a combined Spectrum Scale, Spectrum Archive, Spectrum Protect and Spectrum LSF event. We are looking forwards to a great Usergroup in Sydney. Thanks again to IBM for helping to arrange the venue and event booking. Best Regards, Chris Schlipalius IBM Champion 2019 Team Lead, Storage Infrastructure, Data & Visualisation, The Pawsey Supercomputing Centre (CSIRO) GPFSUGAUS at gmail.com From mnaineni at in.ibm.com Fri Oct 4 11:55:20 2019 From: mnaineni at in.ibm.com (Malahal R Naineni) Date: Fri, 4 Oct 2019 10:55:20 +0000 Subject: [gpfsug-discuss] Filesystem access issues via CES NFS In-Reply-To: <08c2be8b-ece1-a7a4-606f-e63fe854a8b1@psi.ch> References: <08c2be8b-ece1-a7a4-606f-e63fe854a8b1@psi.ch>, <717f49aade0b439eb1b99fc620a21cac@maxiv.lu.se> <9456645b0a1f4b488b13874ea672b9b8@maxiv.lu.se> Message-ID: An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Fri Oct 4 16:51:34 2019 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Fri, 4 Oct 2019 15:51:34 +0000 Subject: [gpfsug-discuss] Lenovo GSS Planned End-of-Support Message-ID: <8cde86c2-3277-1a3a-7f91-62199158f6c4@rutgers.edu> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi there, Anyone know for sure when Lenovo is planning to release it's last version of the GSS software for its GSS solutions? I figure someone might be sufficiently plugged into the development here. Thanks! - -- ____ || \\UTGERS, |----------------------*O*------------------------ ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark `' -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQST3OUUqPn4dxGCSm6Zv6Bp0RyxvgUCXZdqfgAKCRCZv6Bp0Ryx vuDHAJ9vO2/G6YLVbnoifliLDztMcVhENgCg01jB7VhZA9M85hKUe2FUOrKRios= =4iyR -----END PGP SIGNATURE----- From ncalimet at lenovo.com Fri Oct 4 16:59:03 2019 From: ncalimet at lenovo.com (Nicolas CALIMET) Date: Fri, 4 Oct 2019 15:59:03 +0000 Subject: [gpfsug-discuss] [External] Lenovo GSS Planned End-of-Support In-Reply-To: <8cde86c2-3277-1a3a-7f91-62199158f6c4@rutgers.edu> References: <8cde86c2-3277-1a3a-7f91-62199158f6c4@rutgers.edu> Message-ID: Ryan, If the question really is for how long GSS will be supported, then maintenance releases are on the roadmap till at least 2022 in principle. If otherwise you are referring to the latest GSS code levels, then GSS 3.4b has been released late August. Regards, - Nicolas -- Nicolas Calimet, PhD | HPC System Architect | Lenovo DCG | Meitnerstrasse 9, D-70563 Stuttgart, Germany | +49 71165690146 -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Ryan Novosielski Sent: Friday, October 4, 2019 17:52 To: gpfsug-discuss at spectrumscale.org Subject: [External] [gpfsug-discuss] Lenovo GSS Planned End-of-Support -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi there, Anyone know for sure when Lenovo is planning to release it's last version of the GSS software for its GSS solutions? I figure someone might be sufficiently plugged into the development here. Thanks! - -- ____ || \\UTGERS, |----------------------*O*------------------------ ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark `' -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQST3OUUqPn4dxGCSm6Zv6Bp0RyxvgUCXZdqfgAKCRCZv6Bp0Ryx vuDHAJ9vO2/G6YLVbnoifliLDztMcVhENgCg01jB7VhZA9M85hKUe2FUOrKRios= =4iyR -----END PGP SIGNATURE----- _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From novosirj at rutgers.edu Fri Oct 4 17:15:08 2019 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Fri, 4 Oct 2019 16:15:08 +0000 Subject: [gpfsug-discuss] [External] Lenovo GSS Planned End-of-Support In-Reply-To: References: <8cde86c2-3277-1a3a-7f91-62199158f6c4@rutgers.edu> Message-ID: <5228bcf4-fe1b-cfc7-e1aa-071131496011@rutgers.edu> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Yup, that's the question; thanks for the help. I'd heard a rumor that there was a 2020 date, and wanted to see if I could get any indication in particular as to whether that was true. Sounds like even if it's not 2022, it's probably not 2020. We're clear on the current version -- planning the upgrade at the moment . On 10/4/19 11:59 AM, Nicolas CALIMET wrote: > Ryan, > > If the question really is for how long GSS will be supported, then > maintenance releases are on the roadmap till at least 2022 in > principle. If otherwise you are referring to the latest GSS code > levels, then GSS 3.4b has been released late August. > > Regards, - Nicolas > - -- ____ || \\UTGERS, |----------------------*O*------------------------ ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark `' -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQST3OUUqPn4dxGCSm6Zv6Bp0RyxvgUCXZdwBAAKCRCZv6Bp0Ryx vjAWAJ9OGbVfhM0m+/NXCRzXo8raIj/tNwCeMtg0osqnl3l16J4TC3oZGw9xxk4= =utaK -----END PGP SIGNATURE----- From kkr at lbl.gov Fri Oct 4 21:53:20 2019 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Fri, 4 Oct 2019 13:53:20 -0700 Subject: [gpfsug-discuss] Quota via API anyway to avoid negative values? Message-ID: Hi, There is a flag with mmlsquota to prevent the potential of getting negative values back: -e Specifies that mmlsquota is to collect updated quota usage data from all nodes before displaying results. If -e is not specified, there is the potential to display negative usage values as the quota server may process a combination of up-to-date and back-level information. However, we are using the API to collectively show quotas across GPFS and non-GPFS filesystems via one user-driven command. We are getting negative values using the API. Does anyone know the -e equivalent for the API? https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.2/com.ibm.spectrum.scale.v4r22.doc/bl1adm_gpfs_quotactl.htm Thanks, Kristy -------------- next part -------------- An HTML attachment was scrubbed... URL: From vpuvvada at in.ibm.com Sat Oct 5 05:30:49 2019 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Sat, 5 Oct 2019 10:00:49 +0530 Subject: [gpfsug-discuss] afmRefreshAsync questions In-Reply-To: <7476c598c32440f1bffe7d9e950c0965@maxiv.lu.se> References: , , <7476c598c32440f1bffe7d9e950c0965@maxiv.lu.se> Message-ID: I would recommend opening a case, collect the default traces from both gateway and application (or protocol) nodes to check the RPC overhead. There should not be difference between mmap IO and regular IO for AFM filesets. Also note that refresh intervals are stored as part of inode and for the large number of file access it is possible that inodes are evicted as part of dcache shrinkage and next access to the same files might go to home for the revalidation. afmRefreshAsync option can be set at fleset level also. Looks like it is missing from the documentation, this will be corrected. ~Venkat (vpuvvada at in.ibm.com) From: Andreas Mattsson To: gpfsug main discussion list Date: 10/03/2019 07:25 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] afmRefreshAsync questions Sent by: gpfsug-discuss-bounces at spectrumscale.org After further investigaion, it seems like this XDS software is using memory mapped io when operating on the files. Is it possible that MMAP IO has a higher performance hit by AFM than regular file access? /Andreas ____________________________________________ Andreas Mattsson Systems Engineer MAX IV Laboratory Lund University P.O. Box 118, SE-221 00 Lund, Sweden Visiting address: Fotongatan 2, 224 84 Lund Mobile: +46 706 64 95 44 andreas.mattsson at maxiv.lu.se www.maxiv.se Fr?n: gpfsug-discuss-bounces at spectrumscale.org f?r Andreas Mattsson Skickat: den 1 oktober 2019 08:33:35 Till: gpfsug main discussion list ?mne: Re: [gpfsug-discuss] afmRefreshAsync questions Hi, I've tried increasing all the refresh intervals, but even at 300 seconds, there is very little performance increase. The job runs in several steps, and gets held up at two places, as far as I can see. First at a kind of parallelisation step where about 1000-3000 files are created in the current working folder on a single compute node, and then at a step where lots of small output files are written on each of the compute nodes involved in the job. Comparing with running the same data set on a non-AFM cache fileset in the same storage system, it runs at least a factor 5 slower, even with really high refresh intervals. In the Scale documentation, it states that the afmRefreshAsync is only configurable cluster wide. Is it also configurable on a per-fileset level? https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.3/com.ibm.spectrum.scale.v5r03.doc/bl1adm_configurationparametersAFM.htm The software is XDS, http://xds.mpimf-heidelberg.mpg.de/ Unfortunately it is a closed source software, so it is not possible to adapt the software. Regards, Andreas Mattsson ____________________________________________ Andreas Mattsson Systems Engineer MAX IV Laboratory Lund University P.O. Box 118, SE-221 00 Lund, Sweden Visiting address: Fotongatan 2, 224 84 Lund Mobile: +46 706 64 95 44 andreas.mattsson at maxiv.lu.se www.maxiv.se Fr?n: gpfsug-discuss-bounces at spectrumscale.org f?r Venkateswara R Puvvada Skickat: den 27 september 2019 10:23:13 Till: gpfsug main discussion list ?mne: Re: [gpfsug-discuss] afmRefreshAsync questions Hi, Both storage and client clusters have to be on 5.0.3.x to get the AFM revalidation performance with afmRefreshAsync. What are the refresh intervals ?, you could also try increasing them. Is this config option set at fileset level or cluster level ? ~Venkat (vpuvvada at in.ibm.com) From: Andreas Mattsson To: GPFS User Group Date: 09/26/2019 03:26 PM Subject: [EXTERNAL] [gpfsug-discuss] afmRefreshAsync questions Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, Due to having a data analysis software that isn't running well at all in our AFM caches, it runs 4-6 times slower on an AFM cache than on a non-AFM fileset on the same storage system, I wanted to try out the afmRefreshAsync feature that came with 5.0.3 to see if it is the cache data refresh that is holding things up. Enabling this feature has had zero impact on performance of the software though. The storage cluster is running 5.0.3.x, and afmRefreshAsync has been set there, but at the moment the remote-mounting client cluster is still running 5.0.2.x. Would this feature still have any effect in this setup? Regards, Andreas Mattsson ____________________________________________ Andreas Mattsson Systems Engineer MAX IV Laboratory Lund University P.O. Box 118, SE-221 00 Lund, Sweden Visiting address: Fotongatan 2, 224 84 Lund Mobile: +46 706 64 95 44 andreas.mattsson at maxiv.lu.se www.maxiv.se _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=vrw7qt4uEH-dBuEZSxUvPQM-SJOC0diQptL6vnfxCQA&s=rbRvqgv05seDPo5wFgK2jlRkzvHtU7y7zoNQ3rDV0d0&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 4232 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 4232 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 4232 bytes Desc: not available URL: From st.graf at fz-juelich.de Mon Oct 7 08:22:02 2019 From: st.graf at fz-juelich.de (Stephan Graf) Date: Mon, 7 Oct 2019 09:22:02 +0200 Subject: [gpfsug-discuss] Quota via API anyway to avoid negative values? In-Reply-To: References: Message-ID: <9c1fcd81-d947-e857-ffc8-b68d17142bfb@fz-juelich.de> Hi Kristi, I just want to mention that we have a ticket right now at IBM because of negative quota values. In our case even the '-e' does not work: [root at justnsd01a ~]#? mmlsquota -j hpsadm -e largedata Block Limits | File Limits Filesystem type KB quota limit in_doubt grace | files quota limit in_doubt grace Remarks largedata FILESET -45853247616 536870912000 590558003200 0 none | 6 3000000 3300000 0 none The solution offered by support is to run a 'mmcheckquota'. we are still in discussion. Stephan On 10/4/19 10:53 PM, Kristy Kallback-Rose wrote: > Hi, > > There is a flag with mmlsquota to prevent the potential of getting > negative values back: > > -e > Specifies that mmlsquota is to collect updated quota usage data from all > nodes before displaying results. If -e is not specified, there is the > potential to display negative usage values as the quota server may > process a combination of up-to-date and back-level information. > > > However, we are using the API to collectively show quotas across GPFS > and non-GPFS filesystems via one user-driven command. We are getting > negative values using the API. Does anyone know the -e equivalent for > the API? > > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.2/com.ibm.spectrum.scale.v4r22.doc/bl1adm_gpfs_quotactl.htm > > Thanks, > Kristy > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Stephan Graf Juelich Supercomputing Centre Institute for Advanced Simulation Forschungszentrum Juelich GmbH 52425 Juelich, Germany Phone: +49-2461-61-6578 Fax: +49-2461-61-6656 E-mail: st.graf at fz-juelich.de WWW: http://www.fz-juelich.de/jsc/ -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5322 bytes Desc: S/MIME Cryptographic Signature URL: From jonathan.buzzard at strath.ac.uk Mon Oct 7 15:07:55 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 7 Oct 2019 14:07:55 +0000 Subject: [gpfsug-discuss] Large in doubt on fileset Message-ID: <75222a1da28a8b278c655863d1c1f634830f4435.camel@strath.ac.uk> I have a DSS-G system running 4.2.3-7, and on Friday afternoon became aware that there is a very large (at least I have never seen anything on this scale before) in doubt on a fileset. It has persisted over the weekend and is sitting at 17.5TB, with the fileset having a 150TB quota and only 82TB in use. There is a relatively large 26,500 files in doubt, though there is no quotas on file numbers for the fileset. This has come down from some 47,500 on Friday when the in doubt was a shade over 18TB. The largest in doubt I have seen in the past was in the order of a few hundred GB under very heavy write that went away very quickly after the writing stopped. There is no evidence of heavy writing going on in the file system so I am perplexed as to why the in doubt is remaining so high. Any thoughts as to what might be going on? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From pinto at scinet.utoronto.ca Mon Oct 7 15:24:38 2019 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Mon, 7 Oct 2019 10:24:38 -0400 Subject: [gpfsug-discuss] Large in doubt on fileset In-Reply-To: <75222a1da28a8b278c655863d1c1f634830f4435.camel@strath.ac.uk> References: <75222a1da28a8b278c655863d1c1f634830f4435.camel@strath.ac.uk> Message-ID: <4b450056-f1fe-05ed-3bd7-cae4082b3694@scinet.utoronto.ca> We run DSS as well, also 4.2.x versions, and large indoubt entries are common on our file systems, much larger than what you are seeing, for USR, GRP and FILESET. It didn't use to be so bad on versions 3.4|3.5 in other IBM appliances (GSS, ESS), even DDN's or Cray G200. Under 4.x series the internal automatic mechanism to reconcile accounting seems very laggy by default, and I couldn't find (yet) a config parameter to adjust this. I stopped trying to understand why this happens. Our users are all subject to quotas, and can't wait indefinitely for this reconciliation. I just run mmcheckquota every 6 hours via a crontab. I hope version 5 is better. Will know in a couple of months. Jaime On 2019-10-07 10:07 a.m., Jonathan Buzzard wrote: > > I have a DSS-G system running 4.2.3-7, and on Friday afternoon became > aware that there is a very large (at least I have never seen anything > on this scale before) in doubt on a fileset. It has persisted over the > weekend and is sitting at 17.5TB, with the fileset having a 150TB quota > and only 82TB in use. > > There is a relatively large 26,500 files in doubt, though there is no > quotas on file numbers for the fileset. This has come down from some > 47,500 on Friday when the in doubt was a shade over 18TB. > > The largest in doubt I have seen in the past was in the order of a few > hundred GB under very heavy write that went away very quickly after the > writing stopped. > > There is no evidence of heavy writing going on in the file system so I > am perplexed as to why the in doubt is remaining so high. > > Any thoughts as to what might be going on? > > > JAB. > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 From TOMP at il.ibm.com Mon Oct 7 17:22:13 2019 From: TOMP at il.ibm.com (Tomer Perry) Date: Mon, 7 Oct 2019 19:22:13 +0300 Subject: [gpfsug-discuss] Large in doubt on fileset In-Reply-To: <4b450056-f1fe-05ed-3bd7-cae4082b3694@scinet.utoronto.ca> References: <75222a1da28a8b278c655863d1c1f634830f4435.camel@strath.ac.uk> <4b450056-f1fe-05ed-3bd7-cae4082b3694@scinet.utoronto.ca> Message-ID: Hi, The major change around 4.X in quotas was the introduction of dynamic shares. In the past, every client share request was for constant number of blocks ( 20 blocks by default). For high performing system, it wasn't enough sometime ( imagine 320M for nodes are writing at 20GB/s). So, dynamic shares means that a client node can request 10000 blocks etc. etc. ( it doesn't mean that the server will provide those...). OTOH, node failure will leave more "stale in doubt" capacity since the server don't know how much of the share was actually used. Imagine a client node getting 1024 blocks ( 16G), using 20M and crashing. >From the server perspective, there are 16G "unknown", now multiple that by multiple nodes... The only way to solve it is indeed to execute mmcheckquota - but as you probably know, its not cheap. So, do you experience large number of node expels/crashes etc. that might be related to that ( otherwise, it might be some other bug that needs to be fixed...). Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Jaime Pinto To: gpfsug-discuss at spectrumscale.org Date: 07/10/2019 17:40 Subject: [EXTERNAL] Re: [gpfsug-discuss] Large in doubt on fileset Sent by: gpfsug-discuss-bounces at spectrumscale.org We run DSS as well, also 4.2.x versions, and large indoubt entries are common on our file systems, much larger than what you are seeing, for USR, GRP and FILESET. It didn't use to be so bad on versions 3.4|3.5 in other IBM appliances (GSS, ESS), even DDN's or Cray G200. Under 4.x series the internal automatic mechanism to reconcile accounting seems very laggy by default, and I couldn't find (yet) a config parameter to adjust this. I stopped trying to understand why this happens. Our users are all subject to quotas, and can't wait indefinitely for this reconciliation. I just run mmcheckquota every 6 hours via a crontab. I hope version 5 is better. Will know in a couple of months. Jaime On 2019-10-07 10:07 a.m., Jonathan Buzzard wrote: > > I have a DSS-G system running 4.2.3-7, and on Friday afternoon became > aware that there is a very large (at least I have never seen anything > on this scale before) in doubt on a fileset. It has persisted over the > weekend and is sitting at 17.5TB, with the fileset having a 150TB quota > and only 82TB in use. > > There is a relatively large 26,500 files in doubt, though there is no > quotas on file numbers for the fileset. This has come down from some > 47,500 on Friday when the in doubt was a shade over 18TB. > > The largest in doubt I have seen in the past was in the order of a few > hundred GB under very heavy write that went away very quickly after the > writing stopped. > > There is no evidence of heavy writing going on in the file system so I > am perplexed as to why the in doubt is remaining so high. > > Any thoughts as to what might be going on? > > > JAB. > ************************************ TELL US ABOUT YOUR SUCCESS STORIES https://urldefense.proofpoint.com/v2/url?u=http-3A__www.scinethpc.ca_testimonials&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=mLPyKeOa1gNDrORvEXBgMw&m=2KzJ8YjgXm5NsAjcpquw6pMVJFbLUBZ-KEQb2oHFYqs&s=esG-w1Wj_wInSHpT5fEhqVQMqpR15ZXaGxoQmjOKdDc&e= ************************************ --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=mLPyKeOa1gNDrORvEXBgMw&m=2KzJ8YjgXm5NsAjcpquw6pMVJFbLUBZ-KEQb2oHFYqs&s=dxj6p74pt5iaKKn4KvMmMPyLcUD5C37HbIc2zX-iWgY&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Tue Oct 8 11:45:38 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 8 Oct 2019 10:45:38 +0000 Subject: [gpfsug-discuss] Large in doubt on fileset In-Reply-To: References: <75222a1da28a8b278c655863d1c1f634830f4435.camel@strath.ac.uk> <4b450056-f1fe-05ed-3bd7-cae4082b3694@scinet.utoronto.ca> Message-ID: <841c1fd793b4179ea8e27b88f3ed1c7e0f76cb4e.camel@strath.ac.uk> On Mon, 2019-10-07 at 19:22 +0300, Tomer Perry wrote: [SNIP] > > So, do you experience large number of node expels/crashes etc. that > might be related to that ( otherwise, it might be some other bug that > needs to be fixed...). > Not as far as I can determine. The logs show only 58 expels in the last six months and around 2/3rds of those where on essentially dormant nodes that where being used for development work on fixing issues with the xcat node deployment for the compute nodes (triggering an rinstall on a node that was up with GPFS mounted but actually doing nothing). I have done an mmcheckquota which didn't take long to complete and now I the "in doubt" is a more reasonable sub 10GB. I shall monitor what happens more closely in future. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From scale at us.ibm.com Tue Oct 8 14:15:48 2019 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Tue, 8 Oct 2019 09:15:48 -0400 Subject: [gpfsug-discuss] Quota via API anyway to avoid negative values? In-Reply-To: References: Message-ID: Kristy, there is no equivalent to the -e option in the quota API. If your application receives negative quota values it is suggested that you use the mmlsquota command with the -e option to obtain the most recent quota usage information, or run the mmcheckquota command. Using either the -e option to mmlsquota or the mmcheckquota is an IO intensive operation so it would be wise not to run the command when the system is heavily loaded. Note that using the mmcheckquota command does provide QoS options to mitigate the impact of the operation on the cluster. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Kristy Kallback-Rose To: gpfsug main discussion list Date: 10/04/2019 04:53 PM Subject: [EXTERNAL] [gpfsug-discuss] Quota via API anyway to avoid negative values? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, There is a flag with mmlsquota to prevent the potential of getting negative values back: -e Specifies that mmlsquota is to collect updated quota usage data from all nodes before displaying results. If -e is not specified, there is the potential to display negative usage values as the quota server may process a combination of up-to-date and back-level information. However, we are using the API to collectively show quotas across GPFS and non-GPFS filesystems via one user-driven command. We are getting negative values using the API. Does anyone know the -e equivalent for the API? https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.2/com.ibm.spectrum.scale.v4r22.doc/bl1adm_gpfs_quotactl.htm Thanks, Kristy_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=hdhTNLoVRkMglSs8c9Ho37FKFZUJrCmrXG5pXqjtFbE&s=wfHn6xg9_2qzVFdBAthevvEHreS934rP1w88f3jSFcs&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From TROPPENS at de.ibm.com Wed Oct 9 16:50:31 2019 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Wed, 9 Oct 2019 17:50:31 +0200 Subject: [gpfsug-discuss] =?utf-8?q?Fw=3A___Agenda_and_registration_link_/?= =?utf-8?q?/_Oct_10_-_Spectrum=09Scale_NYC_User_Meeting?= Message-ID: Reminder about the user meeting in NYC tomorrow. https://www.spectrumscaleug.org/event/spectrum-scale-nyc-user-meeting-2019/ -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 ----- Forwarded by Ulf Troppens/Germany/IBM on 09/10/2019 17:46 ----- From: "Ulf Troppens" To: gpfsug main discussion list Date: 20/09/2019 10:12 Subject: [EXTERNAL] [gpfsug-discuss] Agenda and registration link // Oct 10 - Spectrum Scale NYC User Meeting Sent by: gpfsug-discuss-bounces at spectrumscale.org Draft agenda and registration link are now available: https://www.spectrumscaleug.org/event/spectrum-scale-nyc-user-meeting-2019/ -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 ----- Forwarded by Ulf Troppens/Germany/IBM on 20/09/2019 09:37 ----- From: "Ulf Troppens" To: gpfsug main discussion list Date: 11/09/2019 14:27 Subject: [EXTERNAL] [gpfsug-discuss] Save the date: Oct 10 - Spectrum Scale NYC User Meeting Sent by: gpfsug-discuss-bounces at spectrumscale.org Greetings, NYU Langone and IBM will host a Spectrum Scale User Meeting on October 10. Many senior engineers of our development lab in Poughkeepsie will attend and present. Details with agenda, exact location and registration link will follow. Best Ulf -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=kZaabFheMr5-INuBtDMnDjxzZMuvvQ-K0cx1FAfh4lg&m=x_he-vxYPdTCut1I-gX7dq5MQmsSZA_1952yvpisLn0&s=ghgxcu8zRWQLv9DIXJ3-CX14SDFrx3hYKsjt-_IWZIM&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From damir.krstic at gmail.com Thu Oct 10 21:43:45 2019 From: damir.krstic at gmail.com (Damir Krstic) Date: Thu, 10 Oct 2019 15:43:45 -0500 Subject: [gpfsug-discuss] waiters and files causing waiters Message-ID: is it possible via some set of mmdiag --waiters or mmfsadm dump ? to figure out which files or directories access (whether it's read or write) is causing long-er waiters? in all my looking i have not been able to get that information out of various diagnostic commands. thanks, damir -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Thu Oct 10 22:26:35 2019 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Thu, 10 Oct 2019 17:26:35 -0400 Subject: [gpfsug-discuss] waiters and files causing waiters In-Reply-To: References: Message-ID: The short answer is there is no easy way to determine what file/directory a waiter may be related. Generally, it is not necessary to know the file/directory since a properly sized/configured cluster should not have long waiters occurring, unless there is some type of failure in the cluster. If you were to capture sufficient information across the cluster you might be able to work out the file/directory involved in a long waiter but it would take either trace, or combing through lots of internal data structures. It would be helpful to know more details about your cluster to provide suggestions for what may be causing the long waiters. I presume you are seeing them on a regular basis and would like to understand why they are occurring. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Damir Krstic To: gpfsug main discussion list Date: 10/10/2019 04:44 PM Subject: [EXTERNAL] [gpfsug-discuss] waiters and files causing waiters Sent by: gpfsug-discuss-bounces at spectrumscale.org is it possible via some set of mmdiag --waiters or mmfsadm dump ? to figure out which files or directories access (whether it's read or write) is causing long-er waiters? in all my looking i have not been able to get that information out of various diagnostic commands. thanks, damir_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=9T66XmHIdF5y7JaNmf28qRGIn35K4t-9H7vwGkDMjgo&s=ncg0MQla29iX--sQeAmcB2XqE3_7zSFGmhnDgj9s--w&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex at calicolabs.com Thu Oct 10 23:33:06 2019 From: alex at calicolabs.com (Alex Chekholko) Date: Thu, 10 Oct 2019 15:33:06 -0700 Subject: [gpfsug-discuss] waiters and files causing waiters In-Reply-To: References: Message-ID: If the waiters are on a compute node and there is not much user work running there, then the open files listed by lsof will probably be the culprits. On Thu, Oct 10, 2019 at 1:44 PM Damir Krstic wrote: > is it possible via some set of mmdiag --waiters or mmfsadm dump ? to > figure out which files or directories access (whether it's read or write) > is causing long-er waiters? > > in all my looking i have not been able to get that information out of > various diagnostic commands. > > thanks, > damir > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Fri Oct 11 00:05:16 2019 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Thu, 10 Oct 2019 23:05:16 +0000 Subject: [gpfsug-discuss] waiters and files causing waiters In-Reply-To: References: Message-ID: <7E19298C-2C28-48F9-BE89-F91B9EC66866@rutgers.edu> I?ll dig through my notes. I had a similar situation and an engineer taught me how to do it. It?s a bit involved though. Not something you?d bother with for something transient. -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Oct 10, 2019, at 16:44, Damir Krstic wrote: ? is it possible via some set of mmdiag --waiters or mmfsadm dump ? to figure out which files or directories access (whether it's read or write) is causing long-er waiters? in all my looking i have not been able to get that information out of various diagnostic commands. thanks, damir _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Fri Oct 11 17:07:30 2019 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Fri, 11 Oct 2019 16:07:30 +0000 Subject: [gpfsug-discuss] User Group Meeting at SC19 - Registration is Open! Message-ID: <9C59AEAC-C26D-47ED-9321-BCC6A58F2E05@nuance.com> Join us at SC19 for the user group meeting on Sunday November 17th at the Hyatt Regency in Denver! This year there will be a morning session for new users to Spectrum Scale. Afternoon portion will be a collection of updates from IBM and user/sponsor talks. Details Here: https://www.spectrumscaleug.org/event/spectrum-scale-user-group-meeting-sc19/ (watch here for agenda updates) You do need to pre-register here: http://www.ibm.com/events/2019/SC19_BC This year we will have a limited number of box lunches available for users, free of charge. We?ll also have WiFi access for the attendees - Huzzah! Many thanks to our sponsors: IBM, Starfish Software, Mark III Systems, and Lenovo for helping us make this event possible and free of charge to all attendees. Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From bamirzadeh at tower-research.com Fri Oct 11 18:04:08 2019 From: bamirzadeh at tower-research.com (Behrooz Amirzadeh) Date: Fri, 11 Oct 2019 13:04:08 -0400 Subject: [gpfsug-discuss] waiters and files causing waiters In-Reply-To: <7E19298C-2C28-48F9-BE89-F91B9EC66866@rutgers.edu> References: <7E19298C-2C28-48F9-BE89-F91B9EC66866@rutgers.edu> Message-ID: I think it depends on the type of deadlock. For example, if hung nodes are the cause of the deadlock. I don't think there will be any files to go after. I've seen that it is possible in certain cases but no guarantees. When the deadlock is detected you can look at the internaldump that gets created on the deadlock node, for example: ===== dump deadlock ===== Current time 2019-09-24_10:17:30-0400 Waiting 904.5729 sec since 10:02:25, on node aresnode7132, thread 3584968 SyncFSWorkerThread: on ThCond 0x18042226DB8 (LkObjCondvar), reason 'waiting for RO lock' Then you search in the same file for the ThCond further down. You'll most likely see that it is associated with a mutex ===== dump condvar ===== Current time 2019-09-24_10:17:32-0400 . . 'LkObjCondvar' at 0x18042226DB8 (0xFFFFC90042226DB8) (mutex 'InodeCacheObjMutex' at 0x18042226C08 (0xFFFFC90042226C08 PTR_OK)) waitCount 1 condvarEventWordP 0xFFFF880DB4AAF088 Then you'll search for the that mutex in the same file ===== dump selected_files ===== Current time 2019-09-24_10:17:32-0400 Files in stripe group gpfs0: Selected: LkObj::mostWanted: 0x18042226D80 lock_state=0x2000000000000000 xlock_state=0x0 lock_flags=0x11 OpenFile: 429E985A0BFE280A:000000008285ECBD:0000000000000000 @ 0x18042226BD8 cach 1 ref 1 hc 3 tc 6 mtx 0x18042226C08 Inode: valid eff token xw @ 0x18042226D80, ctMode xw seq 175 lock state [ xw ] x [] flags [ dmn wka ] writer 39912 hasWaiters 1 0 Mnode: valid eff token xw @ 0x18042226DD0, ctMode xw seq 175 DMAPI: invalid eff token nl @ 0x18042226D30, ctMode nl seq 174 SMBOpen: valid eff token (A: M D: ) @ 0x18042226C60, ctMode (A: M D: ) Flags 0x30 (pfro+pfxw) seq 175 lock state [ (nil) D: ] x [] flags [ ] SMBOpLk: valid eff token wf @ 0x18042226CD0, ctMode wf Flags 0x30 (pfro+pfxw) seq 175 BR: @ 0x18042226E30, ctMode nl Flags 0x10 (pfro) seq 175 treeP 0x18048C1EFB8 C btFastTrack 0 1 ranges mode RO/XW: BLK [0,INF] mode XW node <1335> Fcntl: @ 0x18042226E58, ctMode nl Flags 0x30 (pfro+pfxw) seq 175 treeP 0x1801EBA7EE8 C btFastTrack 0 1 ranges mode RO/XW: BLK [0,INF] mode XW node <1335> * inode 2189814973* snap 0 USERFILE nlink 1 genNum 0x2710E0CC mode 0200100644: -rw-r--r-- tmmgr node (other) metanode (me) fail+panic count -1 flags 0x0, remoteStart 0 remoteCnt 0 localCnt 0 lastFrom 65535 switchCnt 0 BRL nXLocksOrRelinquishes 6 vfsReference 1 dioCount 0 dioFlushNeeded 1 dioSkipCounter 0 dioReentryThreshold 0.000000 lastAllocLsn 0xB8740C5E metadataFlushCount 2, metadataFlushWaiters 0/0, metadataCommitVersion 1 bufferListCount 1 bufferListChangeCount 1 dirty status: dirty fileDirty 1 fileDirtyOrUncommitted 1 dirtiedSyncNum 81078 inodeValid 1 inodeDirtyCount 5 objectVersion 1 mtimeDirty 1 flushVersion 8983 mnodeChangeCount 1 dirtyDataBufs 1 block size code 5 (32 subblocksPerFileBlock) dataBytesPerFileBlock 4194304 fileSize 10213 synchedFileSize 0 indirectionLevel 1 atime 1569333733.493440000 mtime 1569333742.784833000 ctime 1569333742.784712266 crtime 1569333733.493440000 * owner uid 6572 gid 3047* If you were lucky and all of these were found you can get the inode and the uid/gid of the owner of the file. If you happen to catch it quick enough you'll be able to find the file with lsof. Otherwise later with an ilm policy run if the file has not been deleted by the user. Behrooz On Thu, Oct 10, 2019 at 7:05 PM Ryan Novosielski wrote: > I?ll dig through my notes. I had a similar situation and an engineer > taught me how to do it. It?s a bit involved though. Not something you?d > bother with for something transient. > > -- > ____ > || \\UTGERS, > |---------------------------*O*--------------------------- > ||_// the State | Ryan Novosielski - novosirj at rutgers.edu > || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus > || \\ of NJ | Office of Advanced Research Computing - MSB C630, > Newark > `' > > On Oct 10, 2019, at 16:44, Damir Krstic wrote: > > ? > is it possible via some set of mmdiag --waiters or mmfsadm dump ? to > figure out which files or directories access (whether it's read or write) > is causing long-er waiters? > > in all my looking i have not been able to get that information out of > various diagnostic commands. > > thanks, > damir > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Fri Oct 11 18:43:15 2019 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Fri, 11 Oct 2019 17:43:15 +0000 Subject: [gpfsug-discuss] Quotas and AFM Message-ID: <82FEC032-CC22-4535-A490-2FF35E0D625C@rutgers.edu> Does anyone have any good resources or experience with quotas and AFM caches? Our scenario is that we have an AFM home one one site, an AFM cache on another site, and then a client cluster on that remote site that mounts the cache. The AFM filesets are IW. One of them contains our home directories, which have a quota set on the home side. Quotas were disabled entirely on the cache side (I enabled them recently, but did not set them to anything). What I believe we?re running into is scary long AFM queues that are caused by people writing an amount that is over the home quota to the cache, but the cache is accepting it and then failing to sync back to the home because the user is at their hard limit. I believe we?re also seeing delays on unaffected users who are not over their quota, but that?s harder to tell. We have the AFM gateways poorly/not tuned, so that is likely interacting. Is there any way to make the quotas apparent to the cache cluster too, beyond setting a quota there as well, or do I just fundamentally misunderstand this in some other way? We really just want the quotas on the home cluster to be enforced everywhere, more or less. Thanks! -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' From S.J.Thompson at bham.ac.uk Fri Oct 11 20:56:04 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 11 Oct 2019 19:56:04 +0000 Subject: [gpfsug-discuss] Quotas and AFM Message-ID: Yes. When we ran AFM, we had exactly this issue. What would happen is that a user/fileset quota would be hit and a compute job would continue writing. This would eventually fill the AFM queue. If you were lucky you could stop and restart the queue and it would process other files from other users but inevitably we'd get back to the same state. The solution was to increase the quota at home to clear the queue, kill user workload and then reduce their quota again. At home we had replication of two so it wasn't straight forward to set the same quotas on cache, we could just about fudge it for user home directories but not for most of our project storage as we use dependent fileaet quotas. We also saw issues with data in inode at home as this doesn't work at AFM cache so it goes into a block. I've forgotten the exact issues around that now. So our experience was much like you describe. Simon ________________________________ From: on behalf of Ryan Novosielski Sent: Friday, 11 October 2019, 18:43 To: gpfsug main discussion list Subject: [gpfsug-discuss] Quotas and AFM Does anyone have any good resources or experience with quotas and AFM caches? Our scenario is that we have an AFM home one one site, an AFM cache on another site, and then a client cluster on that remote site that mounts the cache. The AFM filesets are IW. One of them contains our home directories, which have a quota set on the home side. Quotas were disabled entirely on the cache side (I enabled them recently, but did not set them to anything). What I believe we?re running into is scary long AFM queues that are caused by people writing an amount that is over the home quota to the cache, but the cache is accepting it and then failing to sync back to the home because the user is at their hard limit. I believe we?re also seeing delays on unaffected users who are not over their quota, but that?s harder to tell. We have the AFM gateways poorly/not tuned, so that is likely interacting. Is there any way to make the quotas apparent to the cache cluster too, beyond setting a quota there as well, or do I just fundamentally misunderstand this in some other way? We really just want the quotas on the home cluster to be enforced everywhere, more or less. Thanks! -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Fri Oct 11 21:05:15 2019 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Fri, 11 Oct 2019 20:05:15 +0000 Subject: [gpfsug-discuss] Quotas and AFM In-Reply-To: References: Message-ID: Do you know is there anything that prevents me from just setting the quotas the same on the IW cache, if there?s no way to inherit? For the case of the home directories, it?s simple, as they are all 100G with some exceptions, so a default user quota takes care of almost all of it. Luckily, that?s right now where our problem is, but we have the potential with other filesets later. I?m also wondering if you can confirm that I should /not/ need to be looking at people who are writing to the at home fileset, where the quotas are set, as a problem syncing TO the cache, e.g. they don?t add to the queue. I assume GPFS sees the over quota and just denies the write, yes? I originally thought the problem was in that direction and was totally perplexed about how it could be so stupid. ? -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Oct 11, 2019, at 15:56, Simon Thompson wrote: ? Yes. When we ran AFM, we had exactly this issue. What would happen is that a user/fileset quota would be hit and a compute job would continue writing. This would eventually fill the AFM queue. If you were lucky you could stop and restart the queue and it would process other files from other users but inevitably we'd get back to the same state. The solution was to increase the quota at home to clear the queue, kill user workload and then reduce their quota again. At home we had replication of two so it wasn't straight forward to set the same quotas on cache, we could just about fudge it for user home directories but not for most of our project storage as we use dependent fileaet quotas. We also saw issues with data in inode at home as this doesn't work at AFM cache so it goes into a block. I've forgotten the exact issues around that now. So our experience was much like you describe. Simon ________________________________ From: on behalf of Ryan Novosielski Sent: Friday, 11 October 2019, 18:43 To: gpfsug main discussion list Subject: [gpfsug-discuss] Quotas and AFM Does anyone have any good resources or experience with quotas and AFM caches? Our scenario is that we have an AFM home one one site, an AFM cache on another site, and then a client cluster on that remote site that mounts the cache. The AFM filesets are IW. One of them contains our home directories, which have a quota set on the home side. Quotas were disabled entirely on the cache side (I enabled them recently, but did not set them to anything). What I believe we?re running into is scary long AFM queues that are caused by people writing an amount that is over the home quota to the cache, but the cache is accepting it and then failing to sync back to the home because the user is at their hard limit. I believe we?re also seeing delays on unaffected users who are not over their quota, but that?s harder to tell. We have the AFM gateways poorly/not tuned, so that is likely interacting. Is there any way to make the quotas apparent to the cache cluster too, beyond setting a quota there as well, or do I just fundamentally misunderstand this in some other way? We really just want the quotas on the home cluster to be enforced everywhere, more or less. Thanks! -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Oct 11 21:10:20 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 11 Oct 2019 20:10:20 +0000 Subject: [gpfsug-discuss] Quotas and AFM In-Reply-To: References: , Message-ID: Yes just set the quotas the same on both. Or a default quota and have exceptions if that works in your case. But this was where I think the inode in file is an issue if you have a lot of small files as in the inode at home they don't consume quota I think but as they are in a data block at cache they do. So it might now be quite so straightforward. And yes writes at home just get out of space, it's the AFM cache that fails on the write back to home but then its in the queue and can block it. Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Ryan Novosielski Sent: Friday, October 11, 2019 9:05:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Quotas and AFM Do you know is there anything that prevents me from just setting the quotas the same on the IW cache, if there?s no way to inherit? For the case of the home directories, it?s simple, as they are all 100G with some exceptions, so a default user quota takes care of almost all of it. Luckily, that?s right now where our problem is, but we have the potential with other filesets later. I?m also wondering if you can confirm that I should /not/ need to be looking at people who are writing to the at home fileset, where the quotas are set, as a problem syncing TO the cache, e.g. they don?t add to the queue. I assume GPFS sees the over quota and just denies the write, yes? I originally thought the problem was in that direction and was totally perplexed about how it could be so stupid. ? -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Oct 11, 2019, at 15:56, Simon Thompson wrote: ? Yes. When we ran AFM, we had exactly this issue. What would happen is that a user/fileset quota would be hit and a compute job would continue writing. This would eventually fill the AFM queue. If you were lucky you could stop and restart the queue and it would process other files from other users but inevitably we'd get back to the same state. The solution was to increase the quota at home to clear the queue, kill user workload and then reduce their quota again. At home we had replication of two so it wasn't straight forward to set the same quotas on cache, we could just about fudge it for user home directories but not for most of our project storage as we use dependent fileaet quotas. We also saw issues with data in inode at home as this doesn't work at AFM cache so it goes into a block. I've forgotten the exact issues around that now. So our experience was much like you describe. Simon ________________________________ From: on behalf of Ryan Novosielski Sent: Friday, 11 October 2019, 18:43 To: gpfsug main discussion list Subject: [gpfsug-discuss] Quotas and AFM Does anyone have any good resources or experience with quotas and AFM caches? Our scenario is that we have an AFM home one one site, an AFM cache on another site, and then a client cluster on that remote site that mounts the cache. The AFM filesets are IW. One of them contains our home directories, which have a quota set on the home side. Quotas were disabled entirely on the cache side (I enabled them recently, but did not set them to anything). What I believe we?re running into is scary long AFM queues that are caused by people writing an amount that is over the home quota to the cache, but the cache is accepting it and then failing to sync back to the home because the user is at their hard limit. I believe we?re also seeing delays on unaffected users who are not over their quota, but that?s harder to tell. We have the AFM gateways poorly/not tuned, so that is likely interacting. Is there any way to make the quotas apparent to the cache cluster too, beyond setting a quota there as well, or do I just fundamentally misunderstand this in some other way? We really just want the quotas on the home cluster to be enforced everywhere, more or less. Thanks! -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Oct 11 21:21:59 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 11 Oct 2019 20:21:59 +0000 Subject: [gpfsug-discuss] Quotas and AFM In-Reply-To: References: , , Message-ID: Oh and I forgot. This only works if you precache th data from home. Otherwise the disk usage at cache is only what you cached, as you don't know what size it is from home. Unless something has changed recently at any rate. Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson Sent: Friday, October 11, 2019 9:10:20 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Quotas and AFM Yes just set the quotas the same on both. Or a default quota and have exceptions if that works in your case. But this was where I think the inode in file is an issue if you have a lot of small files as in the inode at home they don't consume quota I think but as they are in a data block at cache they do. So it might now be quite so straightforward. And yes writes at home just get out of space, it's the AFM cache that fails on the write back to home but then its in the queue and can block it. Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Ryan Novosielski Sent: Friday, October 11, 2019 9:05:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Quotas and AFM Do you know is there anything that prevents me from just setting the quotas the same on the IW cache, if there?s no way to inherit? For the case of the home directories, it?s simple, as they are all 100G with some exceptions, so a default user quota takes care of almost all of it. Luckily, that?s right now where our problem is, but we have the potential with other filesets later. I?m also wondering if you can confirm that I should /not/ need to be looking at people who are writing to the at home fileset, where the quotas are set, as a problem syncing TO the cache, e.g. they don?t add to the queue. I assume GPFS sees the over quota and just denies the write, yes? I originally thought the problem was in that direction and was totally perplexed about how it could be so stupid. ? -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Oct 11, 2019, at 15:56, Simon Thompson wrote: ? Yes. When we ran AFM, we had exactly this issue. What would happen is that a user/fileset quota would be hit and a compute job would continue writing. This would eventually fill the AFM queue. If you were lucky you could stop and restart the queue and it would process other files from other users but inevitably we'd get back to the same state. The solution was to increase the quota at home to clear the queue, kill user workload and then reduce their quota again. At home we had replication of two so it wasn't straight forward to set the same quotas on cache, we could just about fudge it for user home directories but not for most of our project storage as we use dependent fileaet quotas. We also saw issues with data in inode at home as this doesn't work at AFM cache so it goes into a block. I've forgotten the exact issues around that now. So our experience was much like you describe. Simon ________________________________ From: on behalf of Ryan Novosielski Sent: Friday, 11 October 2019, 18:43 To: gpfsug main discussion list Subject: [gpfsug-discuss] Quotas and AFM Does anyone have any good resources or experience with quotas and AFM caches? Our scenario is that we have an AFM home one one site, an AFM cache on another site, and then a client cluster on that remote site that mounts the cache. The AFM filesets are IW. One of them contains our home directories, which have a quota set on the home side. Quotas were disabled entirely on the cache side (I enabled them recently, but did not set them to anything). What I believe we?re running into is scary long AFM queues that are caused by people writing an amount that is over the home quota to the cache, but the cache is accepting it and then failing to sync back to the home because the user is at their hard limit. I believe we?re also seeing delays on unaffected users who are not over their quota, but that?s harder to tell. We have the AFM gateways poorly/not tuned, so that is likely interacting. Is there any way to make the quotas apparent to the cache cluster too, beyond setting a quota there as well, or do I just fundamentally misunderstand this in some other way? We really just want the quotas on the home cluster to be enforced everywhere, more or less. Thanks! -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From vpuvvada at in.ibm.com Mon Oct 14 06:11:21 2019 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Mon, 14 Oct 2019 10:41:21 +0530 Subject: [gpfsug-discuss] Quotas and AFM In-Reply-To: <82FEC032-CC22-4535-A490-2FF35E0D625C@rutgers.edu> References: <82FEC032-CC22-4535-A490-2FF35E0D625C@rutgers.edu> Message-ID: As of today AFM does not support replication or caching of the filesystem or fileset level metadata like quotas, replication factors etc.. , it only supports replication of user's metadata and data. Users have to make sure that same quotas are set at both cache and home clusters. An error message is logged (mmfs.log) at AFM cache gateway if the home have quotas exceeded, and the queue will be stuck until the quotas are increased at the home cluster. ~Venkat (vpuvvada at in.ibm.com) From: Ryan Novosielski To: gpfsug main discussion list Date: 10/11/2019 11:13 PM Subject: [EXTERNAL] [gpfsug-discuss] Quotas and AFM Sent by: gpfsug-discuss-bounces at spectrumscale.org Does anyone have any good resources or experience with quotas and AFM caches? Our scenario is that we have an AFM home one one site, an AFM cache on another site, and then a client cluster on that remote site that mounts the cache. The AFM filesets are IW. One of them contains our home directories, which have a quota set on the home side. Quotas were disabled entirely on the cache side (I enabled them recently, but did not set them to anything). What I believe we?re running into is scary long AFM queues that are caused by people writing an amount that is over the home quota to the cache, but the cache is accepting it and then failing to sync back to the home because the user is at their hard limit. I believe we?re also seeing delays on unaffected users who are not over their quota, but that?s harder to tell. We have the AFM gateways poorly/not tuned, so that is likely interacting. Is there any way to make the quotas apparent to the cache cluster too, beyond setting a quota there as well, or do I just fundamentally misunderstand this in some other way? We really just want the quotas on the home cluster to be enforced everywhere, more or less. Thanks! -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=v6Rlb90lfAveMK0img3_DIq6tq6dce4WXaxNhN0TDBQ&s=PNlMZJgKMhodVCByv07nOOiyF2Sr498Rd4NmIaOkL9g&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From vpuvvada at in.ibm.com Mon Oct 14 07:29:05 2019 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Mon, 14 Oct 2019 11:59:05 +0530 Subject: [gpfsug-discuss] Quotas and AFM In-Reply-To: References: , , Message-ID: As Simon already mentioned, set the similar quotas at both cache and home clusters to avoid the queue stuck problem due to quotas being exceeds home. >At home we had replication of two so it wasn't straight forward to set the same quotas on cache, we could just about fudge it for user home directories but not for most of our project storage as we use dependent fileaet >quotas. AFM will support dependent filesets from 5.0.4. Dependent filesets can be created at the cache in the independent fileset and set the same quotas from the home >We also saw issues with data in inode at home as this doesn't work at AFM cache so it goes into a block. I've forgotten the exact issues around that now. AFM uses some inode space to store the remote file attributes like file handle, file times etc.. as part of the EAs. If the file does not have hard links, maximum inode space used by the AFM is around 200 bytes. AFM cache can store the file's data in the inode if it have 200 bytes of more free space in the inode, otherwise file's data will be stored in subblock rather than using the full block. For example if the inode size is 4K at both cache and home, if the home file size is 3k and inode is using 300 bytes to store the file metadata, then free space in the inode at the home will be 724 bytes(4096 - (3072 + 300)). When this file is cached by the AFM , AFM adds internal EAs for 200 bytes, then the free space in the inode at the cache will be 524 bytes(4096 - (3072 + 300 + 200)). If the filesize is 3600 bytes at the home, AFM cannot store the data in the inode at the cache. So AFM stores the file data in the block only if it does not have enough space to store the internal EAs. ~Venkat (vpuvvada at in.ibm.com) From: Simon Thompson To: gpfsug main discussion list Date: 10/12/2019 01:52 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] Quotas and AFM Sent by: gpfsug-discuss-bounces at spectrumscale.org Oh and I forgot. This only works if you precache th data from home. Otherwise the disk usage at cache is only what you cached, as you don't know what size it is from home. Unless something has changed recently at any rate. Simon From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson Sent: Friday, October 11, 2019 9:10:20 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Quotas and AFM Yes just set the quotas the same on both. Or a default quota and have exceptions if that works in your case. But this was where I think the inode in file is an issue if you have a lot of small files as in the inode at home they don't consume quota I think but as they are in a data block at cache they do. So it might now be quite so straightforward. And yes writes at home just get out of space, it's the AFM cache that fails on the write back to home but then its in the queue and can block it. Simon From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Ryan Novosielski Sent: Friday, October 11, 2019 9:05:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Quotas and AFM Do you know is there anything that prevents me from just setting the quotas the same on the IW cache, if there?s no way to inherit? For the case of the home directories, it?s simple, as they are all 100G with some exceptions, so a default user quota takes care of almost all of it. Luckily, that?s right now where our problem is, but we have the potential with other filesets later. I?m also wondering if you can confirm that I should /not/ need to be looking at people who are writing to the at home fileset, where the quotas are set, as a problem syncing TO the cache, e.g. they don?t add to the queue. I assume GPFS sees the over quota and just denies the write, yes? I originally thought the problem was in that direction and was totally perplexed about how it could be so stupid. ? -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Oct 11, 2019, at 15:56, Simon Thompson wrote: ? Yes. When we ran AFM, we had exactly this issue. What would happen is that a user/fileset quota would be hit and a compute job would continue writing. This would eventually fill the AFM queue. If you were lucky you could stop and restart the queue and it would process other files from other users but inevitably we'd get back to the same state. The solution was to increase the quota at home to clear the queue, kill user workload and then reduce their quota again. At home we had replication of two so it wasn't straight forward to set the same quotas on cache, we could just about fudge it for user home directories but not for most of our project storage as we use dependent fileaet quotas. We also saw issues with data in inode at home as this doesn't work at AFM cache so it goes into a block. I've forgotten the exact issues around that now. So our experience was much like you describe. Simon From: on behalf of Ryan Novosielski Sent: Friday, 11 October 2019, 18:43 To: gpfsug main discussion list Subject: [gpfsug-discuss] Quotas and AFM Does anyone have any good resources or experience with quotas and AFM caches? Our scenario is that we have an AFM home one one site, an AFM cache on another site, and then a client cluster on that remote site that mounts the cache. The AFM filesets are IW. One of them contains our home directories, which have a quota set on the home side. Quotas were disabled entirely on the cache side (I enabled them recently, but did not set them to anything). What I believe we?re running into is scary long AFM queues that are caused by people writing an amount that is over the home quota to the cache, but the cache is accepting it and then failing to sync back to the home because the user is at their hard limit. I believe we?re also seeing delays on unaffected users who are not over their quota, but that?s harder to tell. We have the AFM gateways poorly/not tuned, so that is likely interacting. Is there any way to make the quotas apparent to the cache cluster too, beyond setting a quota there as well, or do I just fundamentally misunderstand this in some other way? We really just want the quotas on the home cluster to be enforced everywhere, more or less. Thanks! -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=FQMV8_Ivetm1R6_TcCWroPT58pjhPJgL39pgOdQEiqw&s=DfvksQLrKgv0OpK3Dr5pR-FUkhNddIvieh9_8h1jyGQ&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.ward at nhm.ac.uk Tue Oct 15 13:34:33 2019 From: p.ward at nhm.ac.uk (Paul Ward) Date: Tue, 15 Oct 2019 12:34:33 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs Message-ID: We are in the process of changing the way GPFS assigns UID/GIDs from internal tdb to using AD RIDs with an offset that matches our linux systems. We, therefore, need to change the ACLs for all the files in GPFS (up to 80 million). We are running in mixed ACL mode, with some POSIX and some NFSv4 ACLs being applied. (This system was set up 14 years ago and has changed roles over time) We are running on linux, so need to have POSIX permissions enabled. What I want to know for those in a similar environment, what do you have as the POSIX owner and group, when NFSv4 ACLs are in use? root:root or do you have all files owned by a filesystem administrator account and group: : on our samba shares we have : admin users = @ So don't actually need the group defined in POSIX. Kindest regards, Paul Paul Ward TS Infrastructure Architect Natural History Museum T: 02079426450 E: p.ward at nhm.ac.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Oct 15 13:51:55 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 15 Oct 2019 12:51:55 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs Message-ID: <531FC6DE-7928-4A4F-B444-DC9D1D78F705@bham.ac.uk> Hi Paul, We use both Windows and Linux with our FS but only have NFSv4 ACLs enabled (we do also set ?chmodAndSetAcl? on the fileset which makes chmod etc work whilst not breaking the ACL badly). We?ve only found 1 case where POSIX ACLs were needed, and really that was some other IBM software that didn?t understand ACLs (which is now fixed). The groups exist in both AD and our internal LDAP where they have gidNumbers assigned. For our research projects we set the following as the default on the directory: $ mmgetacl some-project #NFSv4 ACL #owner:root #group:gITS_BEAR_2019- some-project special:owner@:rwxc:allow:FileInherit:DirInherit (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED group:gITS_BEAR_2019- some-project:rwxc:allow:FileInherit:DirInherit (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED special:everyone@:----:allow (-)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (-)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED special:owner@:rwxc:allow (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED special:group@:rwx-:allow (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (X)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED Simon From: on behalf of Paul Ward Reply to: "gpfsug-discuss at spectrumscale.org" Date: Tuesday, 15 October 2019 at 13:34 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] default owner and group for POSIX ACLs We are in the process of changing the way GPFS assigns UID/GIDs from internal tdb to using AD RIDs with an offset that matches our linux systems. We, therefore, need to change the ACLs for all the files in GPFS (up to 80 million). We are running in mixed ACL mode, with some POSIX and some NFSv4 ACLs being applied. (This system was set up 14 years ago and has changed roles over time) We are running on linux, so need to have POSIX permissions enabled. What I want to know for those in a similar environment, what do you have as the POSIX owner and group, when NFSv4 ACLs are in use? root:root or do you have all files owned by a filesystem administrator account and group: : on our samba shares we have : admin users = @ So don?t actually need the group defined in POSIX. Kindest regards, Paul Paul Ward TS Infrastructure Architect Natural History Museum T: 02079426450 E: p.ward at nhm.ac.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Tue Oct 15 15:30:28 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 15 Oct 2019 14:30:28 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: Message-ID: On Tue, 2019-10-15 at 12:34 +0000, Paul Ward wrote: > We are in the process of changing the way GPFS assigns UID/GIDs from > internal tdb to using AD RIDs with an offset that matches our linux > systems. We, therefore, need to change the ACLs for all the files in > GPFS (up to 80 million). You do realize that will mean backing everything up again... > We are running in mixed ACL mode, with some POSIX and some NFSv4 ACLs > being applied. (This system was set up 14 years ago and has changed > roles over time) We are running on linux, so need to have POSIX > permissions enabled. We run on Linux and only have NFSv4 ACL's applied. I am not sure why you need POSIX ACL's if you are running Linux. Very very few applications will actually check ACL's or even for that matter permissions. They just do an fopen call or similar and the OS either goes yeah or neah, and the app needs to do something in the case of neah. > > What I want to know for those in a similar environment, what do you > have as the POSIX owner and group, when NFSv4 ACLs are in use? > root:root > > or do you have all files owned by a filesystem administrator account > and group: > : > > on our samba shares we have : > admin users = @ > So don?t actually need the group defined in POSIX. > Samba works much better with NFSv4 ACL's. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From S.J.Thompson at bham.ac.uk Tue Oct 15 16:41:35 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 15 Oct 2019 15:41:35 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: Message-ID: <31615908-BB7F-45C8-A9CC-CEA9D81CDE89@bham.ac.uk> I thought Spectrum Protect didn't actually backup again on a file owner change. Sure mmbackup considers it, but I think Protect just updates the metadata. There are also some other options for dsmc that can stop other similar issues if you change ctime maybe. (Other backup tools are available) Simon ?On 15/10/2019, 15:31, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" wrote: On Tue, 2019-10-15 at 12:34 +0000, Paul Ward wrote: > We are in the process of changing the way GPFS assigns UID/GIDs from > internal tdb to using AD RIDs with an offset that matches our linux > systems. We, therefore, need to change the ACLs for all the files in > GPFS (up to 80 million). You do realize that will mean backing everything up again... > We are running in mixed ACL mode, with some POSIX and some NFSv4 ACLs > being applied. (This system was set up 14 years ago and has changed > roles over time) We are running on linux, so need to have POSIX > permissions enabled. We run on Linux and only have NFSv4 ACL's applied. I am not sure why you need POSIX ACL's if you are running Linux. Very very few applications will actually check ACL's or even for that matter permissions. They just do an fopen call or similar and the OS either goes yeah or neah, and the app needs to do something in the case of neah. > > What I want to know for those in a similar environment, what do you > have as the POSIX owner and group, when NFSv4 ACLs are in use? > root:root > > or do you have all files owned by a filesystem administrator account > and group: > : > > on our samba shares we have : > admin users = @ > So don?t actually need the group defined in POSIX. > Samba works much better with NFSv4 ACL's. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From stockf at us.ibm.com Tue Oct 15 17:09:14 2019 From: stockf at us.ibm.com (Frederick Stock) Date: Tue, 15 Oct 2019 16:09:14 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: <31615908-BB7F-45C8-A9CC-CEA9D81CDE89@bham.ac.uk> References: <31615908-BB7F-45C8-A9CC-CEA9D81CDE89@bham.ac.uk>, Message-ID: An HTML attachment was scrubbed... URL: From p.ward at nhm.ac.uk Tue Oct 15 17:15:50 2019 From: p.ward at nhm.ac.uk (Paul Ward) Date: Tue, 15 Oct 2019 16:15:50 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: Message-ID: An amalgamated answer... > You do realize that will mean backing everything up again... From the tests that I have done, it appears not. A Spectrum protect incremental backup performs an 'update' when the ACL is changed via mmputacl or chown. when I do a backup after an mmputacl or chown ACL change on a migrated file, it isn't recalled, so it cant be backing up the file. If I do the same change from windows over a smb mount, it does cause the file to be recalled and backedup. > ...I am not sure why you need POSIX ACL's if you are running Linux... From what I have recently read... https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adm.doc/bl1adm_admnfsaclg.htm "Linux does not allow a file system to be NFS V4 exported unless it supports POSIX ACLs." As I said this system has had roles added to it. The original purpose was to only support NFS exports, then as a staging area for IT, as end user access wasn't needed, only POSIX permissions were used. No it has end user SMB mounts. >?chmodAndSetAcl? Saw this recently - will look at changing to that! https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adm.doc/bl1adm_authoriziefileprotocolusers.htm "To allow proper use of ACLs, it is recommended to prevent chmod from overwriting the ACLs by setting this parameter to setAclOnly or chmodAndSetAcl." >#owner:root OK so you do have root as the owner. > special:owner@:rwxc:allow:FileInherit:DirInherit And have it propagated to children. > group:gITS_BEAR_2019- some-project:rwxc:allow:FileInherit:DirInherit We by default assign two groups to a folder, a RW and R only. > special:everyone@:----:allow > special:owner@:rwxc:allow > special:group@:rwx-:allow I have been removing these. This seems to work, but was set via windows: POSIX: d--------- 2 root root 512 Apr 11 2019 #NFSv4 ACL #owner:root #group:root #ACL flags: # DACL_PRESENT # DACL_AUTO_INHERITED # SACL_AUTO_INHERITED # NULL_SACL group:dg--ro:r-x-:allow:FileInherit:DirInherit (X)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED group:dg--rwm:rwx-:allow:FileInherit:DirInherit (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED group:dl-:r-x-:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED So is root as the owner the norm? Kindest regards, Paul Paul Ward TS Infrastructure Architect Natural History Museum T: 02079426450 E: p.ward at nhm.ac.uk -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jonathan Buzzard Sent: 15 October 2019 15:30 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] default owner and group for POSIX ACLs On Tue, 2019-10-15 at 12:34 +0000, Paul Ward wrote: > We are in the process of changing the way GPFS assigns UID/GIDs from > internal tdb to using AD RIDs with an offset that matches our linux > systems. We, therefore, need to change the ACLs for all the files in > GPFS (up to 80 million). You do realize that will mean backing everything up again... > We are running in mixed ACL mode, with some POSIX and some NFSv4 ACLs > being applied. (This system was set up 14 years ago and has changed > roles over time) We are running on linux, so need to have POSIX > permissions enabled. We run on Linux and only have NFSv4 ACL's applied. I am not sure why you need POSIX ACL's if you are running Linux. Very very few applications will actually check ACL's or even for that matter permissions. They just do an fopen call or similar and the OS either goes yeah or neah, and the app needs to do something in the case of neah. > > What I want to know for those in a similar environment, what do you > have as the POSIX owner and group, when NFSv4 ACLs are in use? > root:root > > or do you have all files owned by a filesystem administrator account > and group: > : > > on our samba shares we have : > admin users = @ > So don?t actually need the group defined in POSIX. > Samba works much better with NFSv4 ACL's. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7Cp.ward%40nhm.ac.uk%7C54e024b8b52b4a70208e08d7517c47fc%7C73a29c014e78437fa0d4c8553e1960c1%7C1%7C0%7C637067466552637538&sdata=v43g1MEBnRBZP%2B5J7ORvywIq6poqhK24fTsCco0IEDo%3D&reserved=0 From p.ward at nhm.ac.uk Tue Oct 15 17:18:15 2019 From: p.ward at nhm.ac.uk (Paul Ward) Date: Tue, 15 Oct 2019 16:18:15 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: <31615908-BB7F-45C8-A9CC-CEA9D81CDE89@bham.ac.uk>, Message-ID: Hi Fred, From the tests I have done changing the ACL results in just an ?update? to when using Spectrum Protect, even on migrated files. Kindest regards, Paul Paul Ward TS Infrastructure Architect Natural History Museum T: 02079426450 E: p.ward at nhm.ac.uk From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Frederick Stock Sent: 15 October 2019 17:09 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] default owner and group for POSIX ACLs As I understand if you change only the POSIX attributes on a file then you are correct that TSM will only backup the file metadata, actually just the POSIX relevant metadata. However, if you change ACLs or other GPFS specific metadata then TSM will backup the entire file, TSM does not keep all file metadata separate from the actual file data. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com ----- Original message ----- From: Simon Thompson > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: [EXTERNAL] Re: [gpfsug-discuss] default owner and group for POSIX ACLs Date: Tue, Oct 15, 2019 11:41 AM I thought Spectrum Protect didn't actually backup again on a file owner change. Sure mmbackup considers it, but I think Protect just updates the metadata. There are also some other options for dsmc that can stop other similar issues if you change ctime maybe. (Other backup tools are available) Simon ?On 15/10/2019, 15:31, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" > wrote: On Tue, 2019-10-15 at 12:34 +0000, Paul Ward wrote: > We are in the process of changing the way GPFS assigns UID/GIDs from > internal tdb to using AD RIDs with an offset that matches our linux > systems. We, therefore, need to change the ACLs for all the files in > GPFS (up to 80 million). You do realize that will mean backing everything up again.... > We are running in mixed ACL mode, with some POSIX and some NFSv4 ACLs > being applied. (This system was set up 14 years ago and has changed > roles over time) We are running on linux, so need to have POSIX > permissions enabled. We run on Linux and only have NFSv4 ACL's applied. I am not sure why you need POSIX ACL's if you are running Linux. Very very few applications will actually check ACL's or even for that matter permissions. They just do an fopen call or similar and the OS either goes yeah or neah, and the app needs to do something in the case of neah. > > What I want to know for those in a similar environment, what do you > have as the POSIX owner and group, when NFSv4 ACLs are in use? > root:root > > or do you have all files owned by a filesystem administrator account > and group: > : > > on our samba shares we have : > admin users = @ > So don?t actually need the group defined in POSIX. > Samba works much better with NFSv4 ACL's. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org gpfsug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Tue Oct 15 17:49:34 2019 From: stockf at us.ibm.com (Frederick Stock) Date: Tue, 15 Oct 2019 16:49:34 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: , <31615908-BB7F-45C8-A9CC-CEA9D81CDE89@bham.ac.uk>, Message-ID: An HTML attachment was scrubbed... URL: From p.ward at nhm.ac.uk Tue Oct 15 19:27:01 2019 From: p.ward at nhm.ac.uk (Paul Ward) Date: Tue, 15 Oct 2019 18:27:01 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: , <31615908-BB7F-45C8-A9CC-CEA9D81CDE89@bham.ac.uk>, Message-ID: I have tested replacing POSIX with NFSv4, I have altered POSIX and altered NFSv4. The example below is NFSv4 changed to POSIX I have also tested on folders. Action Details Pre Changes File is backed up, migrated and has a nfsv4 ACL > ls -l ---------- 1 root 16777221 102400000 Sep 18 15:07 100mb-9.dat > dsmls 102400000 0 0 m 100mb-9.dat > dsmc q backup ?? -inac 102,400,000 B 09/18/2019 15:53:41 NHM_DATA_MC A /?/100mb-9.dat 102,400,000 B 09/18/2019 15:08:58 NHM_DATA_MC I /?/100mb-9.dat >mmgetacl #NFSv4 ACL #owner:root #group:16777221 group:1399645580:rwx-:allow:Inherited (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED group:16783540:rwx-:allow:Inherited (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED group:16777360:r-x-:allow:Inherited (X)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED group:1399621272:r-x-:allow:Inherited (X)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED Erase the nfsv4 acl chown root:root chmod 770 POSIX permissions changed and NFSv4 ACL gone > ls -l -rwxrwx--- 1 root root 102400000 Sep 18 15:07 100mb-9.dat > dsmls 102400000 0 0 m 100mb-9.dat > dsmc q backup ?? -inac 102,400,000 B 09/18/2019 15:53:41 NHM_DATA_MC A /?/100mb-9.dat 102,400,000 B 09/18/2019 15:08:58 NHM_DATA_MC I /?/100mb-9.dat >mmgetacl #owner:root #group:root user::rwxc group::rwx- other::---- Incremental backup Backup ?updates? the backup, but doesn?t transfer any data. dsmc incr "100mb-9.dat" IBM Tivoli Storage Manager Command Line Backup-Archive Client Interface Client Version 7, Release 1, Level 6.4 Client date/time: 10/15/2019 17:57:59 (c) Copyright by IBM Corporation and other(s) 1990, 2016. All Rights Reserved. Node Name: NHM-XXX-XXX Session established with server TSM-XXXXXX: Windows Server Version 7, Release 1, Level 7.0 Server date/time: 10/15/2019 17:57:58 Last access: 10/15/2019 17:57:52 Accessing as node: XXX-XXX Incremental backup of volume '100mb-9.dat' Updating--> 102,400,000 /?/100mb-9.dat [Sent] Successful incremental backup of '/?/100mb-9.dat' Total number of objects inspected: 1 Total number of objects backed up: 0 Total number of objects updated: 1 Total number of objects rebound: 0 Total number of objects deleted: 0 Total number of objects expired: 0 Total number of objects failed: 0 Total number of objects encrypted: 0 Total number of objects grew: 0 Total number of retries: 0 Total number of bytes inspected: 97.65 MB Total number of bytes transferred: 0 B Data transfer time: 0.00 sec Network data transfer rate: 0.00 KB/sec Aggregate data transfer rate: 0.00 KB/sec Objects compressed by: 0% Total data reduction ratio: 100.00% Elapsed processing time: 00:00:01 Post backup Active Backup timestamp hasn?t changed, and file is still migrated. > ls -l -rwxrwx--- 1 root root 102400000 Sep 18 15:07 100mb-9.dat > dsmls 102400000 0 0 m 100mb-9.dat > dsmc q backup ?? -inac 102,400,000 B 09/18/2019 15:53:41 NHM_DATA_MC A /?/100mbM/100mb-9.dat 102,400,000 B 09/18/2019 15:08:58 NHM_DATA_MC I /?/100mbM/100mb-9.dat >mmgetacl #owner:root #group:root user::rwxc group::rwx- other::---- Restore dsmc restore "100mb-9.dat" "100mb-9.dat.restore" IBM Tivoli Storage Manager Command Line Backup-Archive Client Interface Client Version 7, Release 1, Level 6.4 Client date/time: 10/15/2019 18:02:09 (c) Copyright by IBM Corporation and other(s) 1990, 2016. All Rights Reserved. Node Name: NHM-XXX-XXX Session established with server TSM-XXXXXX: Windows Server Version 7, Release 1, Level 7.0 Server date/time: 10/15/2019 18:02:08 Last access: 10/15/2019 18:02:07 Accessing as node: HSM-NHM Restore function invoked. Restoring 102,400,000 /?/100mb-9.dat --> /?/100mb-9.dat.restore [Done] Restore processing finished. Total number of objects restored: 1 Total number of objects failed: 0 Total number of bytes transferred: 97.66 MB Data transfer time: 1.20 sec Network data transfer rate: 83,317.88 KB/sec Aggregate data transfer rate: 689.11 KB/sec Elapsed processing time: 00:02:25 Restored file Restored file has the same permissions as the last backup > ls -l -rwxrwx--- 1 root root 102400000 Sep 18 15:07 100mb-9.dat.restore > dsmls 102400000 102400000 160 r 100mb-9.dat.restore > dsmc q backup ?? -inac ANS1092W No files matching search criteria were found >mmgetacl #owner:root #group:root user::rwxc group::rwx- other::---- I have just noticed: File backedup with POSIX ? restored file permissions POSIX File backedup with POSIX, changed to NFSv4 permissions, incremental backup ? restore file permissions POSIX File backedup with NFSv4, Changed to POSIX permissions, incremental backup ? restore file permissions POSIX File backedup with NFSv4, restore file permissions NFSv4 (there may be other variables involved) Kindest regards, Paul Paul Ward TS Infrastructure Architect Natural History Museum T: 02079426450 E: p.ward at nhm.ac.uk From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Frederick Stock Sent: 15 October 2019 17:50 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] default owner and group for POSIX ACLs Thanks Paul. Could you please clarify which ACL you changed, the GPFS NFSv4 ACL or the POSIX ACL? Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com ----- Original message ----- From: Paul Ward > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: [EXTERNAL] Re: [gpfsug-discuss] default owner and group for POSIX ACLs Date: Tue, Oct 15, 2019 12:18 PM Hi Fred, From the tests I have done changing the ACL results in just an ?update? to when using Spectrum Protect, even on migrated files. Kindest regards, Paul Paul Ward TS Infrastructure Architect Natural History Museum T: 02079426450 E: p.ward at nhm.ac.uk From: gpfsug-discuss-bounces at spectrumscale.org > On Behalf Of Frederick Stock Sent: 15 October 2019 17:09 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] default owner and group for POSIX ACLs As I understand if you change only the POSIX attributes on a file then you are correct that TSM will only backup the file metadata, actually just the POSIX relevant metadata. However, if you change ACLs or other GPFS specific metadata then TSM will backup the entire file, TSM does not keep all file metadata separate from the actual file data. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com ----- Original message ----- From: Simon Thompson > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: [EXTERNAL] Re: [gpfsug-discuss] default owner and group for POSIX ACLs Date: Tue, Oct 15, 2019 11:41 AM I thought Spectrum Protect didn't actually backup again on a file owner change. Sure mmbackup considers it, but I think Protect just updates the metadata. There are also some other options for dsmc that can stop other similar issues if you change ctime maybe. (Other backup tools are available) Simon ?On 15/10/2019, 15:31, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" > wrote: On Tue, 2019-10-15 at 12:34 +0000, Paul Ward wrote: > We are in the process of changing the way GPFS assigns UID/GIDs from > internal tdb to using AD RIDs with an offset that matches our linux > systems. We, therefore, need to change the ACLs for all the files in > GPFS (up to 80 million). You do realize that will mean backing everything up again.... > We are running in mixed ACL mode, with some POSIX and some NFSv4 ACLs > being applied. (This system was set up 14 years ago and has changed > roles over time) We are running on linux, so need to have POSIX > permissions enabled. We run on Linux and only have NFSv4 ACL's applied. I am not sure why you need POSIX ACL's if you are running Linux. Very very few applications will actually check ACL's or even for that matter permissions. They just do an fopen call or similar and the OS either goes yeah or neah, and the app needs to do something in the case of neah. > > What I want to know for those in a similar environment, what do you > have as the POSIX owner and group, when NFSv4 ACLs are in use? > root:root > > or do you have all files owned by a filesystem administrator account > and group: > : > > on our samba shares we have : > admin users = @ > So don?t actually need the group defined in POSIX. > Samba works much better with NFSv4 ACL's. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Oct 15 19:46:06 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 15 Oct 2019 18:46:06 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: , Message-ID: Only the top level of the project is root:root, not all files. The owner inherit is like CREATOROWNER in Windows, so the parent owner isn't inherited, but the permission inherits to newly created files. It was a while ago we worked out our permission defaults but without it we could have users create a file/directory but not be able to edit/change it as whilst the group had permission, the owner didn't. I should note we are all at 5.x code and not 4.2. Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Paul Ward Sent: Tuesday, October 15, 2019 5:15:50 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] default owner and group for POSIX ACLs An amalgamated answer... > You do realize that will mean backing everything up again... >From the tests that I have done, it appears not. A Spectrum protect incremental backup performs an 'update' when the ACL is changed via mmputacl or chown. when I do a backup after an mmputacl or chown ACL change on a migrated file, it isn't recalled, so it cant be backing up the file. If I do the same change from windows over a smb mount, it does cause the file to be recalled and backedup. > ...I am not sure why you need POSIX ACL's if you are running Linux... >From what I have recently read... https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adm.doc/bl1adm_admnfsaclg.htm "Linux does not allow a file system to be NFS V4 exported unless it supports POSIX ACLs." As I said this system has had roles added to it. The original purpose was to only support NFS exports, then as a staging area for IT, as end user access wasn't needed, only POSIX permissions were used. No it has end user SMB mounts. >?chmodAndSetAcl? Saw this recently - will look at changing to that! https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adm.doc/bl1adm_authoriziefileprotocolusers.htm "To allow proper use of ACLs, it is recommended to prevent chmod from overwriting the ACLs by setting this parameter to setAclOnly or chmodAndSetAcl." >#owner:root OK so you do have root as the owner. > special:owner@:rwxc:allow:FileInherit:DirInherit And have it propagated to children. > group:gITS_BEAR_2019- some-project:rwxc:allow:FileInherit:DirInherit We by default assign two groups to a folder, a RW and R only. > special:everyone@:----:allow > special:owner@:rwxc:allow > special:group@:rwx-:allow I have been removing these. This seems to work, but was set via windows: POSIX: d--------- 2 root root 512 Apr 11 2019 #NFSv4 ACL #owner:root #group:root #ACL flags: # DACL_PRESENT # DACL_AUTO_INHERITED # SACL_AUTO_INHERITED # NULL_SACL group:dg--ro:r-x-:allow:FileInherit:DirInherit (X)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED group:dg--rwm:rwx-:allow:FileInherit:DirInherit (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED group:dl-:r-x-:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED So is root as the owner the norm? Kindest regards, Paul Paul Ward TS Infrastructure Architect Natural History Museum T: 02079426450 E: p.ward at nhm.ac.uk -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jonathan Buzzard Sent: 15 October 2019 15:30 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] default owner and group for POSIX ACLs On Tue, 2019-10-15 at 12:34 +0000, Paul Ward wrote: > We are in the process of changing the way GPFS assigns UID/GIDs from > internal tdb to using AD RIDs with an offset that matches our linux > systems. We, therefore, need to change the ACLs for all the files in > GPFS (up to 80 million). You do realize that will mean backing everything up again... > We are running in mixed ACL mode, with some POSIX and some NFSv4 ACLs > being applied. (This system was set up 14 years ago and has changed > roles over time) We are running on linux, so need to have POSIX > permissions enabled. We run on Linux and only have NFSv4 ACL's applied. I am not sure why you need POSIX ACL's if you are running Linux. Very very few applications will actually check ACL's or even for that matter permissions. They just do an fopen call or similar and the OS either goes yeah or neah, and the app needs to do something in the case of neah. > > What I want to know for those in a similar environment, what do you > have as the POSIX owner and group, when NFSv4 ACLs are in use? > root:root > > or do you have all files owned by a filesystem administrator account > and group: > : > > on our samba shares we have : > admin users = @ > So don?t actually need the group defined in POSIX. > Samba works much better with NFSv4 ACL's. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7Cp.ward%40nhm.ac.uk%7C54e024b8b52b4a70208e08d7517c47fc%7C73a29c014e78437fa0d4c8553e1960c1%7C1%7C0%7C637067466552637538&sdata=v43g1MEBnRBZP%2B5J7ORvywIq6poqhK24fTsCco0IEDo%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Oct 15 19:50:54 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 15 Oct 2019 18:50:54 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: , <31615908-BB7F-45C8-A9CC-CEA9D81CDE89@bham.ac.uk>, , Message-ID: Fred, I thought like you that an ACL change caused a backup with mmbackup. Maybe only if you change the NFSv4 ACL. I'm sure it's documented somewhere and there is a flag to Protect to stop this from happening. Maybe a POSIX permission (setfacl style) doesn't trigger a backup. This would tie in with Paul's suggestion that changing via SMB caused the backup to occur. Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of stockf at us.ibm.com Sent: Tuesday, October 15, 2019 5:49:34 PM To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] default owner and group for POSIX ACLs Thanks Paul. Could you please clarify which ACL you changed, the GPFS NFSv4 ACL or the POSIX ACL? Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com ----- Original message ----- From: Paul Ward Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [EXTERNAL] Re: [gpfsug-discuss] default owner and group for POSIX ACLs Date: Tue, Oct 15, 2019 12:18 PM Hi Fred, From the tests I have done changing the ACL results in just an ?update? to when using Spectrum Protect, even on migrated files. Kindest regards, Paul Paul Ward TS Infrastructure Architect Natural History Museum T: 02079426450 E: p.ward at nhm.ac.uk From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Frederick Stock Sent: 15 October 2019 17:09 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] default owner and group for POSIX ACLs As I understand if you change only the POSIX attributes on a file then you are correct that TSM will only backup the file metadata, actually just the POSIX relevant metadata. However, if you change ACLs or other GPFS specific metadata then TSM will backup the entire file, TSM does not keep all file metadata separate from the actual file data. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com ----- Original message ----- From: Simon Thompson > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: [EXTERNAL] Re: [gpfsug-discuss] default owner and group for POSIX ACLs Date: Tue, Oct 15, 2019 11:41 AM I thought Spectrum Protect didn't actually backup again on a file owner change. Sure mmbackup considers it, but I think Protect just updates the metadata. There are also some other options for dsmc that can stop other similar issues if you change ctime maybe. (Other backup tools are available) Simon ?On 15/10/2019, 15:31, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" > wrote: On Tue, 2019-10-15 at 12:34 +0000, Paul Ward wrote: > We are in the process of changing the way GPFS assigns UID/GIDs from > internal tdb to using AD RIDs with an offset that matches our linux > systems. We, therefore, need to change the ACLs for all the files in > GPFS (up to 80 million). You do realize that will mean backing everything up again.... > We are running in mixed ACL mode, with some POSIX and some NFSv4 ACLs > being applied. (This system was set up 14 years ago and has changed > roles over time) We are running on linux, so need to have POSIX > permissions enabled. We run on Linux and only have NFSv4 ACL's applied. I am not sure why you need POSIX ACL's if you are running Linux. Very very few applications will actually check ACL's or even for that matter permissions. They just do an fopen call or similar and the OS either goes yeah or neah, and the app needs to do something in the case of neah. > > What I want to know for those in a similar environment, what do you > have as the POSIX owner and group, when NFSv4 ACLs are in use? > root:root > > or do you have all files owned by a filesystem administrator account > and group: > : > > on our samba shares we have : > admin users = @ > So don?t actually need the group defined in POSIX. > Samba works much better with NFSv4 ACL's. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Tue Oct 15 21:34:34 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 15 Oct 2019 20:34:34 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: Message-ID: On 15/10/2019 17:15, Paul Ward wrote: [SNIP] >> ...I am not sure why you need POSIX ACL's if you are running Linux... > From what I have recently read... > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adm.doc/bl1adm_admnfsaclg.htm > "Linux does not allow a file system to be NFS V4 exported unless it supports POSIX ACLs." > Only if you are using the inbuilt kernel NFS server, which IMHO is awful from a management perspective. That is you have zero visibility into what the hell it is doing when it all goes pear shaped unless you break out dtrace. I am not sure that using dtrace on a production service to find out what is going on is "best practice". It also in my experience stops you cleanly shutting down most of the time. The sooner it gets removed from the kernel the better IMHO. If you are using protocol nodes which is the only supported option as far as I am aware then that does not apply. I would imagined if you are rolling your own Ganesha NFS server it won't matter either. Checking the code of the FSAL in Ganesha shows functions for converting between GPFS ACL's and the ACL format as used by Ganesha. My understanding was one of the drivers for using Ganesha as an NFS server with GPFS was you can write a FSAL to do just that, in the same way as on Samba you load the vfs_gpfs module, unless you are into self flagellation I guess. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From YARD at il.ibm.com Wed Oct 16 05:41:39 2019 From: YARD at il.ibm.com (Yaron Daniel) Date: Wed, 16 Oct 2019 07:41:39 +0300 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: Message-ID: Hi In case you want to review with ls -l the POSIX permissions, please put the relevant permissions on the SMB share, and add CREATOROWNER & CREATETORGROUP. Than ls -l will show you the owner + group + everyone permissions. Regards Yaron Daniel 94 Em Ha'Moshavot Rd Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com Webex: https://ibm.webex.com/meet/yard IBM Israel From: Jonathan Buzzard To: "gpfsug-discuss at spectrumscale.org" Date: 15/10/2019 23:34 Subject: [EXTERNAL] Re: [gpfsug-discuss] default owner and group for POSIX ACLs Sent by: gpfsug-discuss-bounces at spectrumscale.org On 15/10/2019 17:15, Paul Ward wrote: [SNIP] >> ...I am not sure why you need POSIX ACL's if you are running Linux... > From what I have recently read... > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adm.doc/bl1adm_admnfsaclg.htm > "Linux does not allow a file system to be NFS V4 exported unless it supports POSIX ACLs." > Only if you are using the inbuilt kernel NFS server, which IMHO is awful from a management perspective. That is you have zero visibility into what the hell it is doing when it all goes pear shaped unless you break out dtrace. I am not sure that using dtrace on a production service to find out what is going on is "best practice". It also in my experience stops you cleanly shutting down most of the time. The sooner it gets removed from the kernel the better IMHO. If you are using protocol nodes which is the only supported option as far as I am aware then that does not apply. I would imagined if you are rolling your own Ganesha NFS server it won't matter either. Checking the code of the FSAL in Ganesha shows functions for converting between GPFS ACL's and the ACL format as used by Ganesha. My understanding was one of the drivers for using Ganesha as an NFS server with GPFS was you can write a FSAL to do just that, in the same way as on Samba you load the vfs_gpfs module, unless you are into self flagellation I guess. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=Bn1XE9uK2a9CZQ8qKnJE3Q&m=b8w1GtIuT4M2ayhd-sZvIeIGVRrqM7QoXlh1KVj4Zq4&s=huFx7k3Vx10aZ-7AVq1HSVo825JPWVdFaEu3G3Dh-78&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1114 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 3847 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 4266 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 3747 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 3793 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 4301 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 3739 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 3855 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 4338 bytes Desc: not available URL: From mnaineni at in.ibm.com Wed Oct 16 09:21:46 2019 From: mnaineni at in.ibm.com (Malahal R Naineni) Date: Wed, 16 Oct 2019 08:21:46 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Wed Oct 16 09:25:22 2019 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Wed, 16 Oct 2019 08:25:22 +0000 Subject: [gpfsug-discuss] Spectrum Scale Erasure Code Edition (ECE) RedPaper Draft is public now Message-ID: An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Wed Oct 16 10:35:44 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 16 Oct 2019 09:35:44 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: , Message-ID: On Wed, 2019-10-16 at 08:21 +0000, Malahal R Naineni wrote: > >> Ganesha shows functions for converting between GPFS ACL's and the > ACL format as used by Ganesha. > > Ganesha only supports NFSv4 ACLs, so the conversion is a quick one. > kernel NFS server converts NFSv4 ACLs to POSIX ACLs (the mapping > isn't perfect) as many of the Linux file systems only support POSIX > ACLs (at least this was the behavior). > Yes but the point is you don't need POSIX ACL's on your file system if you are doing NFS exports if you use Ganesha as your NFS server and only do NFSv4 exports. It is then down to the client to deal with the ACL's which the Linux client does. In fact it has for as long as I can remember. There are even tools to manipulate the NFSv4 ACL's (see nfs4- acl-tools on RHEL and derivatives). What's missing is "rich ACL" support in the Linux kernel. www.bestbits.at/richacl/ which seems to be down at the moment. Though there has been activity on the user space utilities. https://github.com/andreas-gruenbacher/richacl/ Is it possible to get IBM to devote some resources to moving this along. It would make using GPFS on Linux with ACL's a more pleasant experience. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From p.ward at nhm.ac.uk Wed Oct 16 11:59:03 2019 From: p.ward at nhm.ac.uk (Paul Ward) Date: Wed, 16 Oct 2019 10:59:03 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: , Message-ID: We are running GPFS 4.2.3 with Arcpix build 3.5.10 or 3.5.12. We don't have Ganesha in the build. I'm not sure about the NFS service. Thanks for the responses, its interesting how the discussion has branched into Ganesha and what ACL changes are picked up by Spectrum Protect and mmbackup (my next major change). Any more responses on what is the best practice for the default POSIX owner and group of files and folders, when NFSv4 ACLs are used for SMB shares? Kindest regards, Paul Paul Ward TS Infrastructure Architect Natural History Museum T: 02079426450 E: p.ward at nhm.ac.uk -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jonathan Buzzard Sent: 16 October 2019 10:36 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] default owner and group for POSIX ACLs On Wed, 2019-10-16 at 08:21 +0000, Malahal R Naineni wrote: >> Ganesha shows functions for converting between GPFS ACL's and the ACL format as used by Ganesha. Ganesha only supports NFSv4 ACLs, so the conversion is a quick one. kernel NFS server converts NFSv4 ACLs to POSIX ACLs (the mapping isn't perfect) as many of the Linux file systems only support POSIX ACLs (at least this was the behavior). Yes but the point is you don't need POSIX ACL's on your file system if you are doing NFS exports if you use Ganesha as your NFS server and only do NFSv4 exports. It is then down to the client to deal with the ACL's which the Linux client does. In fact it has for as long as I can remember. There are even tools to manipulate the NFSv4 ACL's (see nfs4- acl-tools on RHEL and derivatives). What's missing is "rich ACL" support in the Linux kernel. https://l.antigena.com/l/wElAOKB71BMteh5p3MJsrMJ1piEPqSzVv7jGE7WAADAaMiBDMV~~SJdC~qYZEePn7-JksRn9_H6cg21GWyrYE77TnWcAWsMEnF3Nwuug0tRR7ud7GDl9vPM3iafYImA3LyGuQInuXsXilJ6R9e2qmotMPRr~Lsq9CHJ2fsu1dBR1EL622lakpWuKLhjucFNsxUODYLWWFMzVbWj_AigKVAIMEX8Xqs0hGKXpOmjJOTejZDjM8bOCA1-jl06wU3DoT-ad3latFOtGR-oTHHwhAmu792L7Grmas12aetAuhTHnCQ6BBtRLGR_-iVJFYKfdyJNMVsDeKcBEBKKFSZdF~7ozqBouoIAZPE6cOA8KQIeh6mt1~_n which seems to be down at the moment. Though there has been activity on the user space utilities. https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fandreas-gruenbacher%2Frichacl%2F&data=02%7C01%7Cp.ward%40nhm.ac.uk%7C2c1e0145dadd4d35842508d7521c4b9c%7C73a29c014e78437fa0d4c8553e1960c1%7C1%7C0%7C637068153793755413&sdata=aUmCoKIC1N5TU95ILatCp2IlmdJ1gKKL8y%2F1V3kWb3M%3D&reserved=0 Is it possible to get IBM to devote some resources to moving this along. It would make using GPFS on Linux with ACL's a more pleasant experience. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7Cp.ward%40nhm.ac.uk%7C2c1e0145dadd4d35842508d7521c4b9c%7C73a29c014e78437fa0d4c8553e1960c1%7C1%7C0%7C637068153793755413&sdata=ZXLszye50npdSFIu1FuLK3eDbUd%2BV5h29xP1N3XD0jQ%3D&reserved=0 From stockf at us.ibm.com Wed Oct 16 12:14:46 2019 From: stockf at us.ibm.com (Frederick Stock) Date: Wed, 16 Oct 2019 11:14:46 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: , , Message-ID: An HTML attachment was scrubbed... URL: From TROPPENS at de.ibm.com Wed Oct 16 13:51:25 2019 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Wed, 16 Oct 2019 14:51:25 +0200 Subject: [gpfsug-discuss] Nov 5 - Spectrum Scale China User Meeting Message-ID: IBM will host a Spectrum Scale User Meeting on November 5 in Shanghai. Senior engineers of our development lab in Beijing will attend and present. Please register here: https://www.spectrumscaleug.org/event/spectrum-scale-china-user-meeting-2019/ -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at esquad.de Wed Oct 16 17:00:00 2019 From: lists at esquad.de (Dieter Mosbach) Date: Wed, 16 Oct 2019 18:00:00 +0200 Subject: [gpfsug-discuss] SMB support on ppc64LE / SLES for SpectrumScale - please vote for RFE Message-ID: <89482a10-bb53-4b49-d37f-7ef2efb28b30@esquad.de> We want to use smb-protocol-nodes for a HANA-SpectrumScale cluster, unfortunately these are only available for RHEL and not for SLES. SLES has a market share of 99% in the HANA environment. I have therefore created a Request for Enhancement (RFE). https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=137250 If you need it, too, please vote for it! Thank you very much! Kind regards Dieter -- Unix and Storage System Engineer HORNBACH-Baumarkt AG Bornheim, Germany From jonathan.buzzard at strath.ac.uk Wed Oct 16 22:32:50 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 16 Oct 2019 21:32:50 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: <31615908-BB7F-45C8-A9CC-CEA9D81CDE89@bham.ac.uk> References: <31615908-BB7F-45C8-A9CC-CEA9D81CDE89@bham.ac.uk> Message-ID: On 15/10/2019 16:41, Simon Thompson wrote: > I thought Spectrum Protect didn't actually backup again on a file > owner change. Sure mmbackup considers it, but I think Protect just > updates the metadata. There are also some other options for dsmc that > can stop other similar issues if you change ctime maybe. > > (Other backup tools are available) > It certainly used too. I spent six months carefully chown'ing files one user at a time so as not to overwhelm the backup, because the first group I did meant no backup for about a week... I have not kept a close eye on it and have just worked on the assumption for the last decade of "don't do that". If it is no longer the case I apologize for spreading incorrect information. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From skylar2 at uw.edu Wed Oct 16 22:46:48 2019 From: skylar2 at uw.edu (Skylar Thompson) Date: Wed, 16 Oct 2019 21:46:48 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: <31615908-BB7F-45C8-A9CC-CEA9D81CDE89@bham.ac.uk> Message-ID: <20191016214648.pnmjmc65e6d4amqi@utumno.gs.washington.edu> On Wed, Oct 16, 2019 at 09:32:50PM +0000, Jonathan Buzzard wrote: > On 15/10/2019 16:41, Simon Thompson wrote: > > I thought Spectrum Protect didn't actually backup again on a file > > owner change. Sure mmbackup considers it, but I think Protect just > > updates the metadata. There are also some other options for dsmc that > > can stop other similar issues if you change ctime maybe. > > > > (Other backup tools are available) > > > > It certainly used too. I spent six months carefully chown'ing files one > user at a time so as not to overwhelm the backup, because the first > group I did meant no backup for about a week... > > I have not kept a close eye on it and have just worked on the assumption > for the last decade of "don't do that". If it is no longer the case I > apologize for spreading incorrect information. TSM can store some amount of metadata in its database without spilling over to a storage pool, so whether a metadata update is cheap or expensive depends not just on ACLs/extended attributes but also the directory entry name length. It can definitely make for some seemingly non-deterministic backup behavior. -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From jonathan.buzzard at strath.ac.uk Thu Oct 17 11:26:45 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Thu, 17 Oct 2019 10:26:45 +0000 Subject: [gpfsug-discuss] mmbackup questions Message-ID: <960c7d8ff66dacb199b36590f96e5faa815e5b06.camel@strath.ac.uk> I have been looking to give mmbackup another go (a very long history with it being a pile of steaming dinosaur droppings last time I tried, but that was seven years ago). Anyway having done a backup last night I am curious about something that does not appear to be explained in the documentation. Basically the output has a line like the following Total number of objects inspected: 474630 What is this number? Is it the number of files that have changed since the last backup or something else as it is not the number of files on the file system by any stretch of the imagination. One would hope that it inspected everything on the file system... Also it appears that the shadow database is held on the GPFS file system that is being backed up. Is there any way to change the location of that? I am only using one node for backup (because I am cheap and don't like paying for more PVU's than I need to) and would like to hold it on the node doing the backup where I can put it on SSD. Which does to things firstly hopefully goes a lot faster, and secondly reduces the impact on the file system of the backup. Anyway a significant speed up (assuming it worked) was achieved but I note even the ancient Xeon E3113 (dual core 3GHz) was never taxed (load average never went above one) and we didn't touch the swap despite only have 24GB of RAM. Though the 10GbE networking did get busy during the transfer of data to the TSM server bit of the backup but during the "assembly stage" it was all a bit quiet, and the DSS-G server nodes where not busy either. What options are there for tuning things because I feel it should be able to go a lot faster. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From stockf at us.ibm.com Thu Oct 17 13:35:18 2019 From: stockf at us.ibm.com (Frederick Stock) Date: Thu, 17 Oct 2019 12:35:18 +0000 Subject: [gpfsug-discuss] mmbackup questions In-Reply-To: <960c7d8ff66dacb199b36590f96e5faa815e5b06.camel@strath.ac.uk> References: <960c7d8ff66dacb199b36590f96e5faa815e5b06.camel@strath.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Oct 17 15:17:17 2019 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 17 Oct 2019 10:17:17 -0400 Subject: [gpfsug-discuss] mmbackup questions In-Reply-To: References: <960c7d8ff66dacb199b36590f96e5faa815e5b06.camel@strath.ac.uk> Message-ID: Along with what Fred wrote, you can look at the mmbackup doc and also peek into the script and find some options to look at the mmapplypolicy RULEs used, and also capture the mmapplypolicy output which will better show you which files and directories are being examined and so forth. --marc From: "Frederick Stock" To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Date: 10/17/2019 08:43 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] mmbackup questions Sent by: gpfsug-discuss-bounces at spectrumscale.org Jonathan the "objects inspected" refers to the number of file system objects that matched the policy rules used for the backup. These rules are influenced by TSM server and client settings, e.g. the dsm.sys file. So not all objects in the file system are actually inspected. As for tuning I think the mmbackup man page is the place to start, and I think it is thorough in its description of the tuning options. You may also want to look at the mmapplypolicy man page since mmbackup invokes it to scan the file system for files that need to be backed up. To my knowledge there are no options to place the shadow database file in another location than the GPFS file system. If the file system has fast storage I see no reason why you could not use a placement policy rule to place the shadow database on that fast storage. However, I think using more than one node for your backups, and adjusting the various threads used by mmbackup will provide you with sufficient performance improvements. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com ----- Original message ----- From: Jonathan Buzzard Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at spectrumscale.org" Cc: Subject: [EXTERNAL] [gpfsug-discuss] mmbackup questions Date: Thu, Oct 17, 2019 8:00 AM I have been looking to give mmbackup another go (a very long history with it being a pile of steaming dinosaur droppings last time I tried, but that was seven years ago). Anyway having done a backup last night I am curious about something that does not appear to be explained in the documentation. Basically the output has a line like the following Total number of objects inspected: 474630 What is this number? Is it the number of files that have changed since the last backup or something else as it is not the number of files on the file system by any stretch of the imagination. One would hope that it inspected everything on the file system... Also it appears that the shadow database is held on the GPFS file system that is being backed up. Is there any way to change the location of that? I am only using one node for backup (because I am cheap and don't like paying for more PVU's than I need to) and would like to hold it on the node doing the backup where I can put it on SSD. Which does to things firstly hopefully goes a lot faster, and secondly reduces the impact on the file system of the backup. Anyway a significant speed up (assuming it worked) was achieved but I note even the ancient Xeon E3113 (dual core 3GHz) was never taxed (load average never went above one) and we didn't touch the swap despite only have 24GB of RAM. Though the 10GbE networking did get busy during the transfer of data to the TSM server bit of the backup but during the "assembly stage" it was all a bit quiet, and the DSS-G server nodes where not busy either. What options are there for tuning things because I feel it should be able to go a lot faster. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8&m=u_URaXsFxbEw29QGkpa5CnXVGJApxske9lAtEPlerYY&s=mWDp7ziqYJ65-FSCOArzVITL9_qBunPqZ9uC9jgjxn8&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From skylar2 at uw.edu Thu Oct 17 15:26:03 2019 From: skylar2 at uw.edu (Skylar Thompson) Date: Thu, 17 Oct 2019 14:26:03 +0000 Subject: [gpfsug-discuss] mmbackup questions In-Reply-To: <960c7d8ff66dacb199b36590f96e5faa815e5b06.camel@strath.ac.uk> References: <960c7d8ff66dacb199b36590f96e5faa815e5b06.camel@strath.ac.uk> Message-ID: <20191017142603.r7dfwrexfnqilsu7@utumno.gs.washington.edu> On Thu, Oct 17, 2019 at 10:26:45AM +0000, Jonathan Buzzard wrote: > I have been looking to give mmbackup another go (a very long history > with it being a pile of steaming dinosaur droppings last time I tried, > but that was seven years ago). > > Anyway having done a backup last night I am curious about something > that does not appear to be explained in the documentation. > > Basically the output has a line like the following > > Total number of objects inspected: 474630 > > What is this number? Is it the number of files that have changed since > the last backup or something else as it is not the number of files on > the file system by any stretch of the imagination. One would hope that > it inspected everything on the file system... I believe this is the number of paths that matched some include rule (or didn't match some exclude rule) for mmbackup. I would assume it would differ from the "total number of objects backed up" line if there were include/exclude rules that mmbackup couldn't process, leaving it to dsmc to decide whether to process. > Also it appears that the shadow database is held on the GPFS file system > that is being backed up. Is there any way to change the location of that? > I am only using one node for backup (because I am cheap and don't like > paying for more PVU's than I need to) and would like to hold it on the > node doing the backup where I can put it on SSD. Which does to things > firstly hopefully goes a lot faster, and secondly reduces the impact on > the file system of the backup. I haven't tried it, but there is a MMBACKUP_RECORD_ROOT environment variable noted in the mmbackup man path: Specifies an alternative directory name for storing all temporary and permanent records for the backup. The directory name specified must be an existing directory and it cannot contain special characters (for example, a colon, semicolon, blank, tab, or comma). Which seems like it might provide a mechanism to store the shadow database elsewhere. For us, though, we provide storage via a cost center, so we would want our customers to eat the full cost of their excessive file counts. > Anyway a significant speed up (assuming it worked) was achieved but I > note even the ancient Xeon E3113 (dual core 3GHz) was never taxed (load > average never went above one) and we didn't touch the swap despite only > have 24GB of RAM. Though the 10GbE networking did get busy during the > transfer of data to the TSM server bit of the backup but during the > "assembly stage" it was all a bit quiet, and the DSS-G server nodes where > not busy either. What options are there for tuning things because I feel > it should be able to go a lot faster. We have some TSM nodes (corresponding to GPFS filesets) that stress out our mmbackup cluster at the sort step of mmbackup. UNIX sort is not RAM-friendly, as it happens. -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From jonathan.buzzard at strath.ac.uk Thu Oct 17 19:04:47 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Thu, 17 Oct 2019 18:04:47 +0000 Subject: [gpfsug-discuss] mmbackup questions In-Reply-To: <20191017142603.r7dfwrexfnqilsu7@utumno.gs.washington.edu> References: <960c7d8ff66dacb199b36590f96e5faa815e5b06.camel@strath.ac.uk> <20191017142603.r7dfwrexfnqilsu7@utumno.gs.washington.edu> Message-ID: <9fb0e5a0-3eee-fdf1-526c-498f42d89aea@strath.ac.uk> On 17/10/2019 15:26, Skylar Thompson wrote: > On Thu, Oct 17, 2019 at 10:26:45AM +0000, Jonathan Buzzard wrote: >> I have been looking to give mmbackup another go (a very long history >> with it being a pile of steaming dinosaur droppings last time I tried, >> but that was seven years ago). >> >> Anyway having done a backup last night I am curious about something >> that does not appear to be explained in the documentation. >> >> Basically the output has a line like the following >> >> Total number of objects inspected: 474630 >> >> What is this number? Is it the number of files that have changed since >> the last backup or something else as it is not the number of files on >> the file system by any stretch of the imagination. One would hope that >> it inspected everything on the file system... > > I believe this is the number of paths that matched some include rule (or > didn't match some exclude rule) for mmbackup. I would assume it would > differ from the "total number of objects backed up" line if there were > include/exclude rules that mmbackup couldn't process, leaving it to dsmc to > decide whether to process. > After digging through dsminstr.log it would appear to be the sum of the combination of new, changed and deleted files that mmbackup is going to process. There is some wierd sh*t going on though with mmbackup on the face of it, where it sends one file to the TSM server. A line with the total number of files in the file system (aka potential backup candidates) would be nice I think. >> Also it appears that the shadow database is held on the GPFS file system >> that is being backed up. Is there any way to change the location of that? >> I am only using one node for backup (because I am cheap and don't like >> paying for more PVU's than I need to) and would like to hold it on the >> node doing the backup where I can put it on SSD. Which does to things >> firstly hopefully goes a lot faster, and secondly reduces the impact on >> the file system of the backup. > > I haven't tried it, but there is a MMBACKUP_RECORD_ROOT environment > variable noted in the mmbackup man path: > > Specifies an alternative directory name for > storing all temporary and permanent records for > the backup. The directory name specified must > be an existing directory and it cannot contain > special characters (for example, a colon, > semicolon, blank, tab, or comma). > > Which seems like it might provide a mechanism to store the shadow database > elsewhere. For us, though, we provide storage via a cost center, so we > would want our customers to eat the full cost of their excessive file counts. > We have set a file quota of one million for all our users. So far only one users has actually needed it raising. It does however make users come and have a conversation with us about what they are doing. With the one exception they have found ways to do their work without abusing the file system as a database. We don't have a SSD storage pool on the file system so moving it to the backup node for which we can add SSD cheaply (I mean really really cheap these days) is more realistic that adding some SSD for a storage pool to the file system. Once I am a bit more familiar with it I will try changing it to the system disks. It's not SSD at the moment but if it works I can easily justify getting some and replacing the existing drives (it would just be two RAID rebuilds away). Last time it was brought up you could not add extra shelves to an existing DSS-G system, you had to buy a whole new one. This is despite the servers shipping with a full complement of SAS cards and a large box full of 12Gbps SAS cables (well over ?1000 worth at list I reckon) that are completely useless. Ok they work and I could use them elsewhere but frankly why ship them if I can't expand!!! >> Anyway a significant speed up (assuming it worked) was achieved but I >> note even the ancient Xeon E3113 (dual core 3GHz) was never taxed (load >> average never went above one) and we didn't touch the swap despite only >> have 24GB of RAM. Though the 10GbE networking did get busy during the >> transfer of data to the TSM server bit of the backup but during the >> "assembly stage" it was all a bit quiet, and the DSS-G server nodes where >> not busy either. What options are there for tuning things because I feel >> it should be able to go a lot faster. > > We have some TSM nodes (corresponding to GPFS filesets) that stress out our > mmbackup cluster at the sort step of mmbackup. UNIX sort is not > RAM-friendly, as it happens. > I have configured more monitoring of the system, and will watch it over the coming days, but nothing was stressed on our system at all as far as I can tell but it was going slower than I had hoped. It was still way faster than a traditional dsmc incr but I was hoping for more though I am not sure why as the backup now takes place well inside my backup window. Perhaps I am being greedy. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From S.J.Thompson at bham.ac.uk Thu Oct 17 19:37:28 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 17 Oct 2019 18:37:28 +0000 Subject: [gpfsug-discuss] mmbackup questions In-Reply-To: <9fb0e5a0-3eee-fdf1-526c-498f42d89aea@strath.ac.uk> References: <960c7d8ff66dacb199b36590f96e5faa815e5b06.camel@strath.ac.uk> <20191017142603.r7dfwrexfnqilsu7@utumno.gs.washington.edu> <9fb0e5a0-3eee-fdf1-526c-498f42d89aea@strath.ac.uk> Message-ID: Mmbackup uses tsbuhelper internally. This is effectively a diff of the previous and current policy scan. Objects inspected is the count of these files that are changed since the last time and these are the candidates sent to the TSM server. You mention not being able to upgrade a DSS-G, I thought this has been available for sometime as a special bid process. We did something very complicated with ours at one point. I also thought the "no-upgrade" was related to a support position from IBM on creating additional DAs. You can't add new storage to an DA, but believe it's possible and now supported (I think) to add expansion shelves into a new DA. (I think ESS also supports this). Note that you don't necessarily get the same performance of doing this as if you'd purchased a fully stacked system in the first place. For example if you initially had 166 drives as a two expansion system and then add 84 drives in a new expansion, you now have two DAs, one smaller than the other and neither the same as if you'd originally created it with 250 drives... I don't actually have any benchmarks to prove this, but it was my understanding from various discussions over time. There are also now both DSS (and ESS) configs with both spinning and SSD enclosures. I assume these aren't special bid only products anymore. Simon ?On 17/10/2019, 19:05, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" wrote: On 17/10/2019 15:26, Skylar Thompson wrote: > On Thu, Oct 17, 2019 at 10:26:45AM +0000, Jonathan Buzzard wrote: >> I have been looking to give mmbackup another go (a very long history >> with it being a pile of steaming dinosaur droppings last time I tried, >> but that was seven years ago). >> >> Anyway having done a backup last night I am curious about something >> that does not appear to be explained in the documentation. >> >> Basically the output has a line like the following >> >> Total number of objects inspected: 474630 >> >> What is this number? Is it the number of files that have changed since >> the last backup or something else as it is not the number of files on >> the file system by any stretch of the imagination. One would hope that >> it inspected everything on the file system... > > I believe this is the number of paths that matched some include rule (or > didn't match some exclude rule) for mmbackup. I would assume it would > differ from the "total number of objects backed up" line if there were > include/exclude rules that mmbackup couldn't process, leaving it to dsmc to > decide whether to process. > After digging through dsminstr.log it would appear to be the sum of the combination of new, changed and deleted files that mmbackup is going to process. There is some wierd sh*t going on though with mmbackup on the face of it, where it sends one file to the TSM server. A line with the total number of files in the file system (aka potential backup candidates) would be nice I think. >> Also it appears that the shadow database is held on the GPFS file system >> that is being backed up. Is there any way to change the location of that? >> I am only using one node for backup (because I am cheap and don't like >> paying for more PVU's than I need to) and would like to hold it on the >> node doing the backup where I can put it on SSD. Which does to things >> firstly hopefully goes a lot faster, and secondly reduces the impact on >> the file system of the backup. > > I haven't tried it, but there is a MMBACKUP_RECORD_ROOT environment > variable noted in the mmbackup man path: > > Specifies an alternative directory name for > storing all temporary and permanent records for > the backup. The directory name specified must > be an existing directory and it cannot contain > special characters (for example, a colon, > semicolon, blank, tab, or comma). > > Which seems like it might provide a mechanism to store the shadow database > elsewhere. For us, though, we provide storage via a cost center, so we > would want our customers to eat the full cost of their excessive file counts. > We have set a file quota of one million for all our users. So far only one users has actually needed it raising. It does however make users come and have a conversation with us about what they are doing. With the one exception they have found ways to do their work without abusing the file system as a database. We don't have a SSD storage pool on the file system so moving it to the backup node for which we can add SSD cheaply (I mean really really cheap these days) is more realistic that adding some SSD for a storage pool to the file system. Once I am a bit more familiar with it I will try changing it to the system disks. It's not SSD at the moment but if it works I can easily justify getting some and replacing the existing drives (it would just be two RAID rebuilds away). Last time it was brought up you could not add extra shelves to an existing DSS-G system, you had to buy a whole new one. This is despite the servers shipping with a full complement of SAS cards and a large box full of 12Gbps SAS cables (well over ?1000 worth at list I reckon) that are completely useless. Ok they work and I could use them elsewhere but frankly why ship them if I can't expand!!! >> Anyway a significant speed up (assuming it worked) was achieved but I >> note even the ancient Xeon E3113 (dual core 3GHz) was never taxed (load >> average never went above one) and we didn't touch the swap despite only >> have 24GB of RAM. Though the 10GbE networking did get busy during the >> transfer of data to the TSM server bit of the backup but during the >> "assembly stage" it was all a bit quiet, and the DSS-G server nodes where >> not busy either. What options are there for tuning things because I feel >> it should be able to go a lot faster. > > We have some TSM nodes (corresponding to GPFS filesets) that stress out our > mmbackup cluster at the sort step of mmbackup. UNIX sort is not > RAM-friendly, as it happens. > I have configured more monitoring of the system, and will watch it over the coming days, but nothing was stressed on our system at all as far as I can tell but it was going slower than I had hoped. It was still way faster than a traditional dsmc incr but I was hoping for more though I am not sure why as the backup now takes place well inside my backup window. Perhaps I am being greedy. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From novosirj at rutgers.edu Fri Oct 18 02:18:04 2019 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Fri, 18 Oct 2019 01:18:04 +0000 Subject: [gpfsug-discuss] waiters and files causing waiters In-Reply-To: References: Message-ID: <9E891DBC-2785-4A49-9E4D-D6D2C11B8740@rutgers.edu> Found my notes on this; very similar to what Behrooz was saying. This here is from ?mmfsadm dump waiters,selected_files?; as you can see here, we?re looking at thread 29168. Apparently below, ?inodeFlushHolder? corresponds to that same thread in the case I was looking at. You could then look up the inode with ?tsfindinode -i ?, so like for the below, "tsfindinode -i 41538053 /gpfs/cache? on our system. ===== dump waiters ==== Current time 2019-05-01_13:48:26-0400 Waiting 0.1669 sec since 13:48:25, monitored, thread 29168 FileBlockWriteFetchHandlerThread: on ThCond 0x7F55E40014C8 (MsgRecordCondvar), reason 'RPC wait' for quotaMsgRequestShare on node 192.168.33.7 ===== dump selected_files ===== Current time 2019-05-01_13:48:36-0400 ... OpenFile: 4E044E5B0601A8C0:000000000279D205:0000000000000000 @ 0x1806AC5EAC8 cach 1 ref 1 hc 2 tc 6 mtx 0x1806AC5EAF8 Inode: valid eff token xw @ 0x1806AC5EC70, ctMode xw seq 170823 lock state [ wf: 1 ] x [] flags [ ] Mnode: valid eff token xw @ 0x1806AC5ECC0, ctMode xw seq 170823 DMAPI: invalid eff token nl @ 0x1806AC5EC20, ctMode nl seq 170821 SMBOpen: valid eff token (A:RMA D: ) @ 0x1806AC5EB50, ctMode (A:RMA D: ) seq 170823 lock state [ M(2) D: ] x [] flags [ ] SMBOpLk: valid eff token wf @ 0x1806AC5EBC0, ctMode wf Flags 0x30 (pfro+pfxw) seq 170822 BR: @ 0x1806AC5ED20, ctMode nl Flags 0x10 (pfro) seq 170823 treeP 0x18016189C08 C btFastTrack 0 1 ranges mode RO/XW: BLK [0,INF] mode XW node <403> Fcntl: @ 0x1806AC5ED48, ctMode nl Flags 0x30 (pfro+pfxw) seq 170823 treeP 0x18031A5E3F8 C btFastTrack 0 1 ranges mode RO/XW: BLK [0,INF] mode XW node <403> inode 41538053 snap 0 USERFILE nlink 1 genNum 0x3CC2743F mode 0200100600: -rw------- tmmgr node (other) metanode (me) fail+panic count -1 flags 0x0, remoteStart 0 remoteCnt 0 localCnt 177 lastFrom 65535 switchCnt 0 locks held in mode xw: 0x1806AC5F238: 0x0-0xFFF tid 15954 gbl 0 mode xw rel 0 BRL nXLocksOrRelinquishes 285 vfsReference 1 dioCount 0 dioFlushNeeded 1 dioSkipCounter 0 dioReentryThreshold 0.000000 hasWriterInstance 1 inodeFlushFlag 1 inodeFlushHolder 29168 openInstCount 1 metadataFlushCount 2, metadataFlushWaiters 0/0, metadataCommitVersion 1 bufferListCount 1 bufferListChangeCount 3 dirty status: flushed dirtiedSyncNum 1477623 SMB oplock state: nWriters 1 indBlockDeallocLock: sharedLockWord 1 exclLockWord 0 upgradeWaitingS_W 0 upgradeWaitingW_X 0 inodeValid 1 objectVersion 240 flushVersion 8086700 mnodeChangeCount 1 block size code 5 (32 subblocksPerFileBlock) dataBytesPerFileBlock 4194304 fileSize 0 synchedFileSize 0 indirectionLevel 1 atime 1556732911.496160000 mtime 1556732911.496479000 ctime 1556732911.496479000 crtime 1556732911.496160000 owner uid 169589 gid 169589 > On Oct 10, 2019, at 4:43 PM, Damir Krstic wrote: > > is it possible via some set of mmdiag --waiters or mmfsadm dump ? to figure out which files or directories access (whether it's read or write) is causing long-er waiters? > > in all my looking i have not been able to get that information out of various diagnostic commands. > > thanks, > damir > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathan.buzzard at strath.ac.uk Fri Oct 18 08:58:40 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 18 Oct 2019 07:58:40 +0000 Subject: [gpfsug-discuss] mmbackup questions In-Reply-To: References: <960c7d8ff66dacb199b36590f96e5faa815e5b06.camel@strath.ac.uk> <20191017142603.r7dfwrexfnqilsu7@utumno.gs.washington.edu> <9fb0e5a0-3eee-fdf1-526c-498f42d89aea@strath.ac.uk> Message-ID: On 17/10/2019 19:37, Simon Thompson wrote: > Mmbackup uses tsbuhelper internally. This is effectively a diff of > the previous and current policy scan. Objects inspected is the count > of these files that are changed since the last time and these are the > candidates sent to the TSM server. > > You mention not being able to upgrade a DSS-G, I thought this has > been available for sometime as a special bid process. We did > something very complicated with ours at one point. I also thought the > "no-upgrade" was related to a support position from IBM on creating > additional DAs. You can't add new storage to an DA, but believe it's > possible and now supported (I think) to add expansion shelves into a > new DA. (I think ESS also supports this). Note that you don't > necessarily get the same performance of doing this as if you'd > purchased a fully stacked system in the first place. For example if > you initially had 166 drives as a two expansion system and then add > 84 drives in a new expansion, you now have two DAs, one smaller than > the other and neither the same as if you'd originally created it with > 250 drives... I don't actually have any benchmarks to prove this, but > it was my understanding from various discussions over time. > Well it was only the beginning of this year that we asked for a quote for expanding our DSS-G as part of a wider storage upgrade that was to be put to the IT funding committee at the university. I was expecting just to need some more shelves, only to told we need to start again. Like I said if that was the case why ship with all those extra unneeded and unusable SAS cards and SAS cables. At the very least it is not environmentally friendly. Then again the spec that came back had a 2x10Gb LOM, despite the DSS-G documentation being very explicit about needing a 4x1Gb LOM, which is still the case in the 2.4b documentation as of last month. I do note odd numbers of shelves other than one is now supported. That said the tools in at least 2.1 incorrectly states having one shelf is unsupported!!! Presumably they the person writing the tool only tested for even numbers not realizing one while odd was supported. You can also mix shelf types now, but again if I wanted to add some SSD it's a new DSS-G not a couple of D1224 shelves. That also nukes the DA argument for no upgrades I think because you would not be wanting to mix the two in that way. > There are also now both DSS (and ESS) configs with both spinning and > SSD enclosures. I assume these aren't special bid only products > anymore. I don't think so, along with odd numbers of shelves they are in general Lenovo literature. They also have a node with NVMe up the front (or more accurately up the back in PCIe slots), the DSS-G100. My take on the DSS-G is that it is a cost effective way to deploy GPFS storage. However there are loads of seemingly arbitrary quirks and limitations, a bit sh*t crazy upgrade procedure and questionable hardware maintenance. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From scale at us.ibm.com Fri Oct 18 09:34:01 2019 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Fri, 18 Oct 2019 16:34:01 +0800 Subject: [gpfsug-discuss] waiters and files causing waiters In-Reply-To: <9E891DBC-2785-4A49-9E4D-D6D2C11B8740@rutgers.edu> References: <9E891DBC-2785-4A49-9E4D-D6D2C11B8740@rutgers.edu> Message-ID: Right for the example from Ryan(and according to the thread name, you know that it is writing to a file or directory), but for other cases, it may take more steps to figure out what access to which file is causing the long waiters(i.e., when mmap is being used on some nodes, or token revoke pending from some node, and etc.). Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Ryan Novosielski To: gpfsug main discussion list Date: 2019/10/18 09:18 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] waiters and files causing waiters Sent by: gpfsug-discuss-bounces at spectrumscale.org Found my notes on this; very similar to what Behrooz was saying. This here is from ?mmfsadm dump waiters,selected_files?; as you can see here, we?re looking at thread 29168. Apparently below, ?inodeFlushHolder? corresponds to that same thread in the case I was looking at. You could then look up the inode with ?tsfindinode -i ?, so like for the below, "tsfindinode -i 41538053 /gpfs/cache? on our system. ===== dump waiters ==== Current time 2019-05-01_13:48:26-0400 Waiting 0.1669 sec since 13:48:25, monitored, thread 29168 FileBlockWriteFetchHandlerThread: on ThCond 0x7F55E40014C8 (MsgRecordCondvar), reason 'RPC wait' for quotaMsgRequestShare on node 192.168.33.7 ===== dump selected_files ===== Current time 2019-05-01_13:48:36-0400 ... OpenFile: 4E044E5B0601A8C0:000000000279D205:0000000000000000 @ 0x1806AC5EAC8 cach 1 ref 1 hc 2 tc 6 mtx 0x1806AC5EAF8 Inode: valid eff token xw @ 0x1806AC5EC70, ctMode xw seq 170823 lock state [ wf: 1 ] x [] flags [ ] Mnode: valid eff token xw @ 0x1806AC5ECC0, ctMode xw seq 170823 DMAPI: invalid eff token nl @ 0x1806AC5EC20, ctMode nl seq 170821 SMBOpen: valid eff token (A:RMA D: ) @ 0x1806AC5EB50, ctMode (A:RMA D: ) seq 170823 lock state [ M(2) D: ] x [] flags [ ] SMBOpLk: valid eff token wf @ 0x1806AC5EBC0, ctMode wf Flags 0x30 (pfro+pfxw) seq 170822 BR: @ 0x1806AC5ED20, ctMode nl Flags 0x10 (pfro) seq 170823 treeP 0x18016189C08 C btFastTrack 0 1 ranges mode RO/XW: BLK [0,INF] mode XW node <403> Fcntl: @ 0x1806AC5ED48, ctMode nl Flags 0x30 (pfro+pfxw) seq 170823 treeP 0x18031A5E3F8 C btFastTrack 0 1 ranges mode RO/XW: BLK [0,INF] mode XW node <403> inode 41538053 snap 0 USERFILE nlink 1 genNum 0x3CC2743F mode 0200100600: -rw------- tmmgr node (other) metanode (me) fail+panic count -1 flags 0x0, remoteStart 0 remoteCnt 0 localCnt 177 lastFrom 65535 switchCnt 0 locks held in mode xw: 0x1806AC5F238: 0x0-0xFFF tid 15954 gbl 0 mode xw rel 0 BRL nXLocksOrRelinquishes 285 vfsReference 1 dioCount 0 dioFlushNeeded 1 dioSkipCounter 0 dioReentryThreshold 0.000000 hasWriterInstance 1 inodeFlushFlag 1 inodeFlushHolder 29168 openInstCount 1 metadataFlushCount 2, metadataFlushWaiters 0/0, metadataCommitVersion 1 bufferListCount 1 bufferListChangeCount 3 dirty status: flushed dirtiedSyncNum 1477623 SMB oplock state: nWriters 1 indBlockDeallocLock: sharedLockWord 1 exclLockWord 0 upgradeWaitingS_W 0 upgradeWaitingW_X 0 inodeValid 1 objectVersion 240 flushVersion 8086700 mnodeChangeCount 1 block size code 5 (32 subblocksPerFileBlock) dataBytesPerFileBlock 4194304 fileSize 0 synchedFileSize 0 indirectionLevel 1 atime 1556732911.496160000 mtime 1556732911.496479000 ctime 1556732911.496479000 crtime 1556732911.496160000 owner uid 169589 gid 169589 > On Oct 10, 2019, at 4:43 PM, Damir Krstic wrote: > > is it possible via some set of mmdiag --waiters or mmfsadm dump ? to figure out which files or directories access (whether it's read or write) is causing long-er waiters? > > in all my looking i have not been able to get that information out of various diagnostic commands. > > thanks, > damir > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=VdmIfneKWlidYoO2I90hBuZJ2VxXu8L8oq86E7zyh8Q&s=dkQrCzMmxeh6tu0UpPgSIphmRwcBiSpL7QbZPw5RNtI&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=VdmIfneKWlidYoO2I90hBuZJ2VxXu8L8oq86E7zyh8Q&s=dkQrCzMmxeh6tu0UpPgSIphmRwcBiSpL7QbZPw5RNtI&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From jon at well.ox.ac.uk Tue Oct 22 10:12:31 2019 From: jon at well.ox.ac.uk (Jon Diprose) Date: Tue, 22 Oct 2019 09:12:31 +0000 Subject: [gpfsug-discuss] AMD Rome support? Message-ID: Dear GPFSUG, I see the faq says Spectrum Scale is supported on "AMD Opteron based servers". Does anyone know if/when support will be officially extended to cover AMD Epyc, especially the new 7002 (Rome) series? Does anyone have any experience of running Spectrum Scale on Rome they could share, in particular for protocol nodes and for plain clients? Thanks, Jon -- Dr. Jonathan Diprose Tel: 01865 287837 Research Computing Manager Henry Wellcome Building for Genomic Medicine Roosevelt Drive, Headington, Oxford OX3 7BN From knop at us.ibm.com Tue Oct 22 17:30:38 2019 From: knop at us.ibm.com (Felipe Knop) Date: Tue, 22 Oct 2019 12:30:38 -0400 Subject: [gpfsug-discuss] AMD Rome support? In-Reply-To: References: Message-ID: Jon, AMD processors which are completely compatible with Opteron should also work. Please also refer to Q5.3 on the SMP scaling limit: 64 cores: https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html Regards, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Jon Diprose To: gpfsug main discussion list Date: 10/22/2019 05:13 AM Subject: [EXTERNAL] [gpfsug-discuss] AMD Rome support? Sent by: gpfsug-discuss-bounces at spectrumscale.org Dear GPFSUG, I see the faq says Spectrum Scale is supported on "AMD Opteron based servers". Does anyone know if/when support will be officially extended to cover AMD Epyc, especially the new 7002 (Rome) series? Does anyone have any experience of running Spectrum Scale on Rome they could share, in particular for protocol nodes and for plain clients? Thanks, Jon -- Dr. Jonathan Diprose Tel: 01865 287837 Research Computing Manager Henry Wellcome Building for Genomic Medicine Roosevelt Drive, Headington, Oxford OX3 7BN _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=eizQJGD_5DpnaQUqNkIE3V9qJciVjfLCgo4ZHixZ5Ns&s=JomlTDVPlwFCvLtVOmGd4J6FrfbUK6cMVlLe5Ut638U&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From jonathan.buzzard at strath.ac.uk Tue Oct 22 19:40:36 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 22 Oct 2019 18:40:36 +0000 Subject: [gpfsug-discuss] AMD Rome support? In-Reply-To: References: Message-ID: <1c594dbd-4f5c-45aa-57aa-6b610d5c0e86@strath.ac.uk> On 22/10/2019 17:30, Felipe Knop wrote: > Jon, > > AMD processors which are completely compatible with Opteron should also > work. > > Please also refer to Q5.3 on the SMP scaling limit: 64 cores: > > https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html > Hum, is that per CPU or the total for a machine? The reason I ask is we have some large memory nodes (3TB of RAM) and these are quad Xeon 6138 CPU's giving a total of 80 cores in the machine... We have not seen any problems, but if it is 64 cores per machine IBM needs to do some scaling testing ASAP to raise the limit as 64 cores per machine in 2019 is ridiculously low. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From Stephan.Peinkofer at lrz.de Wed Oct 23 06:00:44 2019 From: Stephan.Peinkofer at lrz.de (Peinkofer, Stephan) Date: Wed, 23 Oct 2019 05:00:44 +0000 Subject: [gpfsug-discuss] AMD Rome support? In-Reply-To: References: Message-ID: <0E081EFD-538E-4E00-A625-54B99F57D960@lrz.de> Dear Jon, we run a bunch of AMD EPYC Naples Dual Socket servers with GPFS in our TSM Server Cluster. From what I can say it runs stable, but IO performance in general and GPFS performance in particular - even compared to an Xeon E5 v3 system - is rather poor. So to put that into perspective on the Xeon Systems with two EDR IB Links, we get 20GB/s read and write performance to GPFS using iozone very easily. On the AMD systems - with all AMD EPYC tuning suggestions applied you can find in the internet - we get around 15GB/s write but only 6GB/s read. We also opened a ticket at IBM for this but never found out anything. Probably because not many are running GPFS on AMD EPYC right now? The answer from AMD basically was that the bad IO performance is expected in Dual Socket systems because the Socket Interconnect is the bottleneck. (See also the IB tests DELL did https://www.dell.com/support/article/de/de/debsdt1/sln313856/amd-epyc-stream-hpl-infiniband-and-wrf-performance-study?lang=en as soon as you have to cross the socket border you get only half of the IB performance) Of course with ROME everything get?s better (that?s what AMD told us through our vendor) but if you have the chance then I would recommend to benchmark AMD vs. XEON with your particular IO workloads before buying. Best Regards, Stephan Peinkofer -- Stephan Peinkofer Dipl. Inf. (FH), M. Sc. (TUM) Leibniz Supercomputing Centre Data and Storage Division Boltzmannstra?e 1, 85748 Garching b. M?nchen URL: http://www.lrz.de On 22. Oct 2019, at 11:12, Jon Diprose > wrote: Dear GPFSUG, I see the faq says Spectrum Scale is supported on "AMD Opteron based servers". Does anyone know if/when support will be officially extended to cover AMD Epyc, especially the new 7002 (Rome) series? Does anyone have any experience of running Spectrum Scale on Rome they could share, in particular for protocol nodes and for plain clients? Thanks, Jon -- Dr. Jonathan Diprose > Tel: 01865 287837 Research Computing Manager Henry Wellcome Building for Genomic Medicine Roosevelt Drive, Headington, Oxford OX3 7BN _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Ivano.Talamo at psi.ch Wed Oct 23 10:49:02 2019 From: Ivano.Talamo at psi.ch (Talamo Ivano Giuseppe (PSI)) Date: Wed, 23 Oct 2019 09:49:02 +0000 Subject: [gpfsug-discuss] Filesystem access issues via CES NFS In-Reply-To: References: <08c2be8b-ece1-a7a4-606f-e63fe854a8b1@psi.ch> <717f49aade0b439eb1b99fc620a21cac@maxiv.lu.se> <9456645b0a1f4b488b13874ea672b9b8@maxiv.lu.se> Message-ID: Dear all, We are actually in the process of upgrading our CES cluster to 5.0.3-3 but we have doubts about how to proceed. Considering that the CES cluster is in production and heavily used, our plan is to add a new node with 5.0.3-3 to the cluster that is currently 5.0.2.1. And we would like to proceed in a cautious way, so that the new node would not take any IP and just one day per week (when we will declare to be ?at risk?) we would move some IPs to it. After some weeks of tests if we would see no problem we would upgrade the rest of the cluster. But reading these doc [1] it seems that we cannot have multiple GPFS/SMB version in the same cluster. So in that case we could not have a testing/acceptance phase but could only make the full blind jump. Can someone confirm or negate this? Thanks, Ivano [1] https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/com.ibm.spectrum.scale.v5r02.doc/bl1ins_updatingsmb.htm On 04.10.19, 12:55, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Malahal R Naineni" wrote: You can use 5.0.3.3 . There is no fix for the sssd issue yet though. I will work with Ganesha upstream community pretty soon. Regards, Malahal. ----- Original message ----- From: Leonardo Sala To: gpfsug main discussion list , "Malahal R Naineni" , Cc: Subject: [EXTERNAL] Re: [gpfsug-discuss] Filesystem access issues via CES NFS Date: Fri, Oct 4, 2019 12:02 PM Dear Malahal, thanks for the answer. Concerning SSSD, we are also using it, should we use 5.0.2-PTF3? We would like to avoid using 5.0.2.2, as it has issues with recent RHEL 7.6 kernels [*] and we are impacted: do you suggest to use 5.0.3.3? cheers leo [*] https://www.ibm.com/support/pages/ibm-spectrum-scale-gpfs-releases-42313-or-later-and-5022-or-later-have-issues-where-kernel-crashes-rhel76-0 Paul Scherrer Institut Dr. Leonardo Sala Group Leader High Performance Computing Deputy Section Head Science IT Science IT WHGA/106 5232 Villigen PSI Switzerland Phone: +41 56 310 3369 leonardo.sala at psi.ch www.psi.ch On 03.10.19 19:15, Malahal R Naineni wrote: >> @Malahal: Looks like you have written the netgroup caching code, feel free to ask for further details if required. Hi Ulrich, Ganesha uses innetgr() call for netgroup information and sssd has too many issues in its implementation. Redhat said that they are going to fix sssd synchronization issues in RHEL8. It is in my plate to serialize innergr() call in Ganesha to match kernel NFS server usage! I expect the sssd issue to give EACCESS/EPERM kind of issue but not EINVAL though. If you are using sssd, you must be getting into a sssd issue. Ganesha has a host-ip cache fix in 5.0.2 PTF3. Please make sure you use ganesha version V2.5.3-ibm030.01 if you are using netgroups (shipped with 5.0.2 PTF3 but can be used with Scale 5.0.1 or later) Regards, Malahal. ----- Original message ----- From: Ulrich Sibiller Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Cc: Subject: Re: [gpfsug-discuss] Filesystem access issues via CES NFS Date: Thu, Dec 13, 2018 7:32 PM On 23.11.2018 14:41, Andreas Mattsson wrote: > Yes, this is repeating. > > We?ve ascertained that it has nothing to do at all with file operations on the GPFS side. > > Randomly throughout the filesystem mounted via NFS, ls or file access will give > > ? > > > ls: reading directory /gpfs/filessystem/test/testdir: Invalid argument > > ? > > Trying again later might work on that folder, but might fail somewhere else. > > We have tried exporting the same filesystem via a standard kernel NFS instead of the CES > Ganesha-NFS, and then the problem doesn?t exist. > > So it is definitely related to the Ganesha NFS server, or its interaction with the file system. > > Will see if I can get a tcpdump of the issue. We see this, too. We cannot trigger it. Fortunately I have managed to capture some logs with debugging enabled. I have now dug into the ganesha 2.5.3 code and I think the netgroup caching is the culprit. Here some FULL_DEBUG output: 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :Check for address 1.2.3.4 for export id 1 path /gpfsexport 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] client_match :EXPORT :M_DBG :Match V4: 0xcf7fe0 NETGROUP_CLIENT: netgroup1 (options=421021e2root_squash , RWrw, 3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] client_match :EXPORT :M_DBG :Match V4: 0xcfe320 NETGROUP_CLIENT: netgroup2 (options=421021e2root_squash , RWrw, 3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] client_match :EXPORT :M_DBG :Match V4: 0xcfe380 NETGROUP_CLIENT: netgroup3 (options=421021e2root_squash , RWrw, 3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :EXPORT (options=03303002 , , , , , -- Deleg, , ) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :EXPORT_DEFAULTS (options=42102002root_squash , ----, 3--, ---, TCP, ----, Manage_Gids , , anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :default options (options=03303002root_squash , ----, 34-, UDP, TCP, ----, No Manage_Gids, -- Deleg, anon_uid= -2, anon_gid= -2, none, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :Final options (options=42102002root_squash , ----, 3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_rpc_execute :DISP :INFO :DISP: INFO: Client ::ffff:1.2.3.4 is not allowed to access Export_Id 1 /gpfsexport, vers=3, proc=18 The client "client1" is definitely a member of the "netgroup1". But the NETGROUP_CLIENT lookups for "netgroup2" and "netgroup3" can only happen if the netgroup caching code reports that "client1" is NOT a member of "netgroup1". I have also opened a support case at IBM for this. @Malahal: Looks like you have written the netgroup caching code, feel free to ask for further details if required. Kind regards, Ulrich Sibiller -- Dipl.-Inf. Ulrich Sibiller science + computing ag System Administration Hagellocher Weg 73 72070 Tuebingen, Germany https://atos.net/de/deutschland/sc -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From luis.bolinches at fi.ibm.com Wed Oct 23 10:56:57 2019 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Wed, 23 Oct 2019 09:56:57 +0000 Subject: [gpfsug-discuss] Filesystem access issues via CES NFS In-Reply-To: References: , <08c2be8b-ece1-a7a4-606f-e63fe854a8b1@psi.ch><717f49aade0b439eb1b99fc620a21cac@maxiv.lu.se><9456645b0a1f4b488b13874ea672b9b8@maxiv.lu.se> Message-ID: An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Oct 23 11:14:23 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Wed, 23 Oct 2019 10:14:23 +0000 Subject: [gpfsug-discuss] Filesystem access issues via CES NFS In-Reply-To: References: <08c2be8b-ece1-a7a4-606f-e63fe854a8b1@psi.ch> <717f49aade0b439eb1b99fc620a21cac@maxiv.lu.se> <9456645b0a1f4b488b13874ea672b9b8@maxiv.lu.se> Message-ID: <69D9E45C-1B31-4C5F-97C1-37F9C8ECC6EF@bham.ac.uk> From our experience, you can generally upgrade the GPFS code node by node, but the SMB code has to be identical on all nodes. So that's basically a do it one day and cross your fingers it doesn't break moment... but it is disruptive as well as you have to stop SMB to do the upgrade. I think there is a long standing RFE open on this about non disruptive SMB upgrades... Simon ?On 23/10/2019, 10:49, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Ivano.Talamo at psi.ch" wrote: Dear all, We are actually in the process of upgrading our CES cluster to 5.0.3-3 but we have doubts about how to proceed. Considering that the CES cluster is in production and heavily used, our plan is to add a new node with 5.0.3-3 to the cluster that is currently 5.0.2.1. And we would like to proceed in a cautious way, so that the new node would not take any IP and just one day per week (when we will declare to be ?at risk?) we would move some IPs to it. After some weeks of tests if we would see no problem we would upgrade the rest of the cluster. But reading these doc [1] it seems that we cannot have multiple GPFS/SMB version in the same cluster. So in that case we could not have a testing/acceptance phase but could only make the full blind jump. Can someone confirm or negate this? Thanks, Ivano [1] https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/com.ibm.spectrum.scale.v5r02.doc/bl1ins_updatingsmb.htm On 04.10.19, 12:55, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Malahal R Naineni" wrote: You can use 5.0.3.3 . There is no fix for the sssd issue yet though. I will work with Ganesha upstream community pretty soon. Regards, Malahal. ----- Original message ----- From: Leonardo Sala To: gpfsug main discussion list , "Malahal R Naineni" , Cc: Subject: [EXTERNAL] Re: [gpfsug-discuss] Filesystem access issues via CES NFS Date: Fri, Oct 4, 2019 12:02 PM Dear Malahal, thanks for the answer. Concerning SSSD, we are also using it, should we use 5.0.2-PTF3? We would like to avoid using 5.0.2.2, as it has issues with recent RHEL 7.6 kernels [*] and we are impacted: do you suggest to use 5.0.3.3? cheers leo [*] https://www.ibm.com/support/pages/ibm-spectrum-scale-gpfs-releases-42313-or-later-and-5022-or-later-have-issues-where-kernel-crashes-rhel76-0 Paul Scherrer Institut Dr. Leonardo Sala Group Leader High Performance Computing Deputy Section Head Science IT Science IT WHGA/106 5232 Villigen PSI Switzerland Phone: +41 56 310 3369 leonardo.sala at psi.ch www.psi.ch On 03.10.19 19:15, Malahal R Naineni wrote: >> @Malahal: Looks like you have written the netgroup caching code, feel free to ask for further details if required. Hi Ulrich, Ganesha uses innetgr() call for netgroup information and sssd has too many issues in its implementation. Redhat said that they are going to fix sssd synchronization issues in RHEL8. It is in my plate to serialize innergr() call in Ganesha to match kernel NFS server usage! I expect the sssd issue to give EACCESS/EPERM kind of issue but not EINVAL though. If you are using sssd, you must be getting into a sssd issue. Ganesha has a host-ip cache fix in 5.0.2 PTF3. Please make sure you use ganesha version V2.5.3-ibm030.01 if you are using netgroups (shipped with 5.0.2 PTF3 but can be used with Scale 5.0.1 or later) Regards, Malahal. ----- Original message ----- From: Ulrich Sibiller Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Cc: Subject: Re: [gpfsug-discuss] Filesystem access issues via CES NFS Date: Thu, Dec 13, 2018 7:32 PM On 23.11.2018 14:41, Andreas Mattsson wrote: > Yes, this is repeating. > > We?ve ascertained that it has nothing to do at all with file operations on the GPFS side. > > Randomly throughout the filesystem mounted via NFS, ls or file access will give > > ? > > > ls: reading directory /gpfs/filessystem/test/testdir: Invalid argument > > ? > > Trying again later might work on that folder, but might fail somewhere else. > > We have tried exporting the same filesystem via a standard kernel NFS instead of the CES > Ganesha-NFS, and then the problem doesn?t exist. > > So it is definitely related to the Ganesha NFS server, or its interaction with the file system. > > Will see if I can get a tcpdump of the issue. We see this, too. We cannot trigger it. Fortunately I have managed to capture some logs with debugging enabled. I have now dug into the ganesha 2.5.3 code and I think the netgroup caching is the culprit. Here some FULL_DEBUG output: 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :Check for address 1.2.3.4 for export id 1 path /gpfsexport 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] client_match :EXPORT :M_DBG :Match V4: 0xcf7fe0 NETGROUP_CLIENT: netgroup1 (options=421021e2root_squash , RWrw, 3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] client_match :EXPORT :M_DBG :Match V4: 0xcfe320 NETGROUP_CLIENT: netgroup2 (options=421021e2root_squash , RWrw, 3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] client_match :EXPORT :M_DBG :Match V4: 0xcfe380 NETGROUP_CLIENT: netgroup3 (options=421021e2root_squash , RWrw, 3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :EXPORT (options=03303002 , , , , , -- Deleg, , ) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :EXPORT_DEFAULTS (options=42102002root_squash , ----, 3--, ---, TCP, ----, Manage_Gids , , anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :default options (options=03303002root_squash , ----, 34-, UDP, TCP, ----, No Manage_Gids, -- Deleg, anon_uid= -2, anon_gid= -2, none, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :Final options (options=42102002root_squash , ----, 3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_rpc_execute :DISP :INFO :DISP: INFO: Client ::ffff:1.2.3.4 is not allowed to access Export_Id 1 /gpfsexport, vers=3, proc=18 The client "client1" is definitely a member of the "netgroup1". But the NETGROUP_CLIENT lookups for "netgroup2" and "netgroup3" can only happen if the netgroup caching code reports that "client1" is NOT a member of "netgroup1". I have also opened a support case at IBM for this. @Malahal: Looks like you have written the netgroup caching code, feel free to ask for further details if required. Kind regards, Ulrich Sibiller -- Dipl.-Inf. Ulrich Sibiller science + computing ag System Administration Hagellocher Weg 73 72070 Tuebingen, Germany https://atos.net/de/deutschland/sc -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathan.buzzard at strath.ac.uk Wed Oct 23 12:20:18 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 23 Oct 2019 11:20:18 +0000 Subject: [gpfsug-discuss] Filesystem access issues via CES NFS In-Reply-To: <69D9E45C-1B31-4C5F-97C1-37F9C8ECC6EF@bham.ac.uk> References: <08c2be8b-ece1-a7a4-606f-e63fe854a8b1@psi.ch> <717f49aade0b439eb1b99fc620a21cac@maxiv.lu.se> <9456645b0a1f4b488b13874ea672b9b8@maxiv.lu.se> <69D9E45C-1B31-4C5F-97C1-37F9C8ECC6EF@bham.ac.uk> Message-ID: On Wed, 2019-10-23 at 10:14 +0000, Simon Thompson wrote: > From our experience, you can generally upgrade the GPFS code node by > node, but the SMB code has to be identical on all nodes. So that's > basically a do it one day and cross your fingers it doesn't break > moment... but it is disruptive as well as you have to stop SMB to do > the upgrade. I think there is a long standing RFE open on this about > non disruptive SMB upgrades... > My understanding is that the issue is the ctdb database suffers from basically being a "memory dump", so a change in the code can effect the database so all the nodes have to be the same. It's the same issue that historically plagued Microsoft Office file formats. Though of course you might get lucky and it just works. I have in the past in the days of role your own because there was no such thing as IBM provided Samba for GPFS done exactly that on several occasions. There was not warnings not to at the time... If you want to do testing before deployment a test cluster is the way forward. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From Ivano.Talamo at psi.ch Wed Oct 23 12:23:22 2019 From: Ivano.Talamo at psi.ch (Talamo Ivano Giuseppe (PSI)) Date: Wed, 23 Oct 2019 11:23:22 +0000 Subject: [gpfsug-discuss] Filesystem access issues via CES NFS In-Reply-To: References: <08c2be8b-ece1-a7a4-606f-e63fe854a8b1@psi.ch> <717f49aade0b439eb1b99fc620a21cac@maxiv.lu.se> <9456645b0a1f4b488b13874ea672b9b8@maxiv.lu.se> <69D9E45C-1B31-4C5F-97C1-37F9C8ECC6EF@bham.ac.uk> Message-ID: Yes, thanks for the feedback. We already have a test cluster, so I guess we will go that way, just making sure to stay as close as possible to the production one. Cheers, Ivano On 23.10.19, 13:20, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" wrote: On Wed, 2019-10-23 at 10:14 +0000, Simon Thompson wrote: > From our experience, you can generally upgrade the GPFS code node by > node, but the SMB code has to be identical on all nodes. So that's > basically a do it one day and cross your fingers it doesn't break > moment... but it is disruptive as well as you have to stop SMB to do > the upgrade. I think there is a long standing RFE open on this about > non disruptive SMB upgrades... > My understanding is that the issue is the ctdb database suffers from basically being a "memory dump", so a change in the code can effect the database so all the nodes have to be the same. It's the same issue that historically plagued Microsoft Office file formats. Though of course you might get lucky and it just works. I have in the past in the days of role your own because there was no such thing as IBM provided Samba for GPFS done exactly that on several occasions. There was not warnings not to at the time... If you want to do testing before deployment a test cluster is the way forward. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From A.Wolf-Reber at de.ibm.com Wed Oct 23 14:05:24 2019 From: A.Wolf-Reber at de.ibm.com (Alexander Wolf) Date: Wed, 23 Oct 2019 13:05:24 +0000 Subject: [gpfsug-discuss] Filesystem access issues via CES NFS In-Reply-To: References: , <08c2be8b-ece1-a7a4-606f-e63fe854a8b1@psi.ch><717f49aade0b439eb1b99fc620a21cac@maxiv.lu.se><9456645b0a1f4b488b13874ea672b9b8@maxiv.lu.se><69D9E45C-1B31-4C5F-97C1-37F9C8ECC6EF@bham.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15718124397183.png Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15718124397184.png Type: image/png Size: 6645 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15718124397185.png Type: image/png Size: 1134 bytes Desc: not available URL: From david_johnson at brown.edu Wed Oct 23 16:19:24 2019 From: david_johnson at brown.edu (David Johnson) Date: Wed, 23 Oct 2019 11:19:24 -0400 Subject: [gpfsug-discuss] question about spectrum scale 5.0.3 installer Message-ID: <71C5E053-263A-4889-99EC-63F9A8D5E806@brown.edu> I built a test cluster a month ago on 14 nodes. Today I want to install two more NSD nodes. When I tried to run the installer, it looks like it is going back and fiddling with the nodes that were installed earlier, and are up and running with the filesystem mounted. I ended up having to abort the install (rebooted the two new nodes because they were stuck on multpath that had had earlier errors), and the messages indicated that the installation failed on all the existing NSD and GUI nodes, but no mention of the two that I wanted to install on. Do I have anything to worry about when I try again (now that multipath is fixed)? I want to be able to incrementally add servers and clients as we go along, and not have the installer messing up previous progress. Can I tell the installer exactly which nodes to work on? Thanks, ? ddj Dave Johnson Brown University From david_johnson at brown.edu Wed Oct 23 16:33:01 2019 From: david_johnson at brown.edu (David Johnson) Date: Wed, 23 Oct 2019 11:33:01 -0400 Subject: [gpfsug-discuss] question about spectrum scale 5.0.3 installer In-Reply-To: <71C5E053-263A-4889-99EC-63F9A8D5E806@brown.edu> References: <71C5E053-263A-4889-99EC-63F9A8D5E806@brown.edu> Message-ID: <54DAC656-CEFE-4AF2-BB4F-9A595DD067C4@brown.edu> By the way, we have been dealing with adding and deleting nodes manually since GPFS 3.4, back in 2009. At what point is the spectrumscale command line utility more trouble than it?s worth? > On Oct 23, 2019, at 11:19 AM, David Johnson wrote: > > I built a test cluster a month ago on 14 nodes. Today I want to install two more NSD nodes. > When I tried to run the installer, it looks like it is going back and fiddling with the nodes that > were installed earlier, and are up and running with the filesystem mounted. > > I ended up having to abort the install (rebooted the two new nodes because they were stuck > on multpath that had had earlier errors), and the messages indicated that the installation failed > on all the existing NSD and GUI nodes, but no mention of the two that I wanted to install on. > > Do I have anything to worry about when I try again (now that multipath is fixed)? I want to be > able to incrementally add servers and clients as we go along, and not have the installer > messing up previous progress. Can I tell the installer exactly which nodes to work on? > > Thanks, > ? ddj > Dave Johnson > Brown University From Robert.Oesterlin at nuance.com Thu Oct 24 15:03:25 2019 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 24 Oct 2019 14:03:25 +0000 Subject: [gpfsug-discuss] ESS - Considerations when adding NSD space? Message-ID: <4C700BA6-90D1-40B7-BBDA-48645E74D7F7@nuance.com> We recently upgraded our GL4 to a GL6 (trouble free process for those considering FYI). I now have 615T free (raw) in each of my recovery groups. I?d like to increase the size of one of the file systems (currently at 660T, I?d like to add 100T). My first thought was going to be: mmvdisk vdiskset define --vdisk-set fsdata1 --recovery-group rg_gssio1-hs,rg_gssio2-hs --set-size 50T --code 8+2p --block-size 4m --nsd-usage dataOnly --storage-pool data mmvdisk vdiskset create --vdisk-set fs1data1 mmvdisk filesystem add --filesystem fs1 --vdisk-set fs1data1 I know in the past use of mixed size NSDs was frowned upon, not sure on the ESS. The other approach would be add two larger NSDs (current ones are 330T) of 380T, migrate the data to the new ones using mmrestripe, then delete the old ones. The other benefit of this process would be to have the file system data better balanced across all the storage enclosures. Any considerations before I do this? Thoughts? Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Thu Oct 24 16:54:50 2019 From: stockf at us.ibm.com (Frederick Stock) Date: Thu, 24 Oct 2019 15:54:50 +0000 Subject: [gpfsug-discuss] ESS - Considerations when adding NSD space? In-Reply-To: <4C700BA6-90D1-40B7-BBDA-48645E74D7F7@nuance.com> References: <4C700BA6-90D1-40B7-BBDA-48645E74D7F7@nuance.com> Message-ID: An HTML attachment was scrubbed... URL: From A.Wolf-Reber at de.ibm.com Thu Oct 24 20:43:13 2019 From: A.Wolf-Reber at de.ibm.com (Alexander Wolf) Date: Thu, 24 Oct 2019 19:43:13 +0000 Subject: [gpfsug-discuss] ESS - Considerations when adding NSD space? In-Reply-To: References: , <4C700BA6-90D1-40B7-BBDA-48645E74D7F7@nuance.com> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15719308156166.png Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15719308156167.png Type: image/png Size: 6645 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15719308156168.png Type: image/png Size: 1134 bytes Desc: not available URL: From lgayne at us.ibm.com Fri Oct 25 18:54:02 2019 From: lgayne at us.ibm.com (Lyle Gayne) Date: Fri, 25 Oct 2019 17:54:02 +0000 Subject: [gpfsug-discuss] ESS - Considerations when adding NSD space? In-Reply-To: References: , , <4C700BA6-90D1-40B7-BBDA-48645E74D7F7@nuance.com> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15719308156166.png Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15719308156167.png Type: image/png Size: 6645 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15719308156168.png Type: image/png Size: 1134 bytes Desc: not available URL: From olaf.weiser at de.ibm.com Fri Oct 25 18:59:48 2019 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Fri, 25 Oct 2019 19:59:48 +0200 Subject: [gpfsug-discuss] ESS - Considerations when adding NSD space? In-Reply-To: References: , <4C700BA6-90D1-40B7-BBDA-48645E74D7F7@nuance.com> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 6645 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1134 bytes Desc: not available URL: From Robert.Oesterlin at nuance.com Mon Oct 28 14:02:57 2019 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 28 Oct 2019 14:02:57 +0000 Subject: [gpfsug-discuss] Question on CES Authentication - LDAP Message-ID: <15B1F438-38DD-4F2C-89CA-5C4EE8929CFA@nuance.com> This relates to V 5.0.3. If my CES server node has system defined authentication using LDAP, should I expect that setting my authentication setting of ?userdefined? using mmuserauth to work? That doesn?t seem to be the case for me. Is there some other setting I should be using? I tried using LDAP in mmuserauth, and that promptly stomped on my sssd.conf file on that node which broke everything. Any by the way, stores a plain text password in the sssd.conf file just for good measure! Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Mon Oct 28 17:12:08 2019 From: valdis.kletnieks at vt.edu (Valdis Kl=?utf-8?Q?=c4=93?=tnieks) Date: Mon, 28 Oct 2019 13:12:08 -0400 Subject: [gpfsug-discuss] Question on CES Authentication - LDAP In-Reply-To: <15B1F438-38DD-4F2C-89CA-5C4EE8929CFA@nuance.com> References: <15B1F438-38DD-4F2C-89CA-5C4EE8929CFA@nuance.com> Message-ID: <55677.1572282728@turing-police> On Mon, 28 Oct 2019 14:02:57 -0000, "Oesterlin, Robert" said: > Any by the way, stores a plain text password in the sssd.conf file just for > good measure! Note that if you want the system to come up without intervention, at best you can only store an obfuscated password, not a securely encrypted one. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From jonathan.buzzard at strath.ac.uk Tue Oct 29 10:14:57 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 29 Oct 2019 10:14:57 +0000 Subject: [gpfsug-discuss] Question on CES Authentication - LDAP In-Reply-To: <55677.1572282728@turing-police> References: <15B1F438-38DD-4F2C-89CA-5C4EE8929CFA@nuance.com> <55677.1572282728@turing-police> Message-ID: <1d324529a566cdd262a8874e48938002f9c1b4d0.camel@strath.ac.uk> On Mon, 2019-10-28 at 13:12 -0400, Valdis Kl?tnieks wrote: > On Mon, 28 Oct 2019 14:02:57 -0000, "Oesterlin, Robert" said: > > Any by the way, stores a plain text password in the sssd.conf file > > just for good measure! > > Note that if you want the system to come up without intervention, at > best you can only store an obfuscated password, not a securely > encrypted one. > Kerberos and a machine account spring to mind. Crazy given Kerberos is a Unix technology everyone seems to forget about it. Also my understanding is that in theory a TPM module in your server can be used for this https://en.wikipedia.org/wiki/Trusted_Platform_Module Support in Linux is weak at best, but basically it can be used to store passwords and it can be tied to the system. Locality and physical presence being the terminology used. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From linesr at janelia.hhmi.org Thu Oct 31 15:23:59 2019 From: linesr at janelia.hhmi.org (Lines, Robert) Date: Thu, 31 Oct 2019 15:23:59 +0000 Subject: [gpfsug-discuss] Inherited ACLs and multi-protocol access Message-ID: I know I am missing something here and it is probably due to lack of experience dealing with ACLs as all other storage we distil down to just posix UGO permissions. We have Windows native clients creating data. There are SMB clients of various flavors accessing data via CES. Then there are Linux native clients that interface between gpfs and other NFS filers for data movement. What I am running into is around inheriting permissions so that windows native and smb clients have access based on the users group membership that remains sane while also being able to migrate files off to nfs filers with reasonable posix permissions. Here is the top level directory that is the lab name and there is a matching group. That directory is the highest point where an ACL has been set with inheritance. The directory listed is one created from a Windows Native client. The issue I am running into is that that largec7 directory that was created is having the posix permissions set to nothing for the owner. The ACL that results is okay but when that folder or anything in it is synced off to another filer that only has the basic posix permission it acts kinda wonky. The user was able to fix up his files on the other filer because he was still the owner but I would like to make it work properly. [root at gpfs-dm1 smith]# ls -la drwxrwsr-x 84 root smith 16384 Oct 30 23:22 . d---rwsr-x 2 tim smith 4096 Oct 30 23:22 largec7 drwx--S--- 2 tim smith 4096 Oct 24 00:17 CFA1 [root at gpfs-dm1 smith]# mmgetacl . #NFSv4 ACL #owner:root #group:smith special:owner@:rwxc:allow:FileInherit:DirInherit (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED special:group@:rwxc:allow:FileInherit:DirInherit (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED special:everyone@:r-x-:allow:FileInherit:DirInherit (X)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED [root at gpfs-dm1 smith]# mmgetacl largec7 #NFSv4 ACL #owner:tim #group:smith #ACL flags: # DACL_PRESENT # DACL_AUTO_INHERITED # SACL_AUTO_INHERITED user:root:rwxc:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED special:group@:rwxc:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED special:everyone@:r-x-:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED In contrast the CFA1 directory was created prior to the file and directory inheritance being put in place. That worked okay as long as it was only that user but the lack of group access is a problem and what led to trying to sort out the inherited ACLs in the first place. [root at gpfs-dm1 smith]# ls -l drwx--S--- 2 tim smith 4096 Oct 24 00:17 CFA1 [root at gpfs-dm1 smith]# mmgetacl CFA1 #NFSv4 ACL #owner:tim #group:smith #ACL flags: # DACL_PRESENT # DACL_AUTO_INHERITED # SACL_AUTO_INHERITED special:owner@:rwxc:allow (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED user:15000001:rwxc:allow (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED user:15000306:r-x-:allow (X)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED Thank you for any suggestions. -- Rob Lines Sr. HPC Engineer HHMI Janelia Research Campus -------------- next part -------------- An HTML attachment was scrubbed... URL: From andreas.mattsson at maxiv.lu.se Tue Oct 1 07:33:35 2019 From: andreas.mattsson at maxiv.lu.se (Andreas Mattsson) Date: Tue, 1 Oct 2019 06:33:35 +0000 Subject: [gpfsug-discuss] afmRefreshAsync questions In-Reply-To: References: , Message-ID: Hi, I've tried increasing all the refresh intervals, but even at 300 seconds, there is very little performance increase. The job runs in several steps, and gets held up at two places, as far as I can see. First at a kind of parallelisation step where about 1000-3000 files are created in the current working folder on a single compute node, and then at a step where lots of small output files are written on each of the compute nodes involved in the job. Comparing with running the same data set on a non-AFM cache fileset in the same storage system, it runs at least a factor 5 slower, even with really high refresh intervals. In the Scale documentation, it states that the afmRefreshAsync is only configurable cluster wide. Is it also configurable on a per-fileset level? https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.3/com.ibm.spectrum.scale.v5r03.doc/bl1adm_configurationparametersAFM.htm The software is XDS, http://xds.mpimf-heidelberg.mpg.de/ Unfortunately it is a closed source software, so it is not possible to adapt the software. Regards, Andreas Mattsson ____________________________________________ [X] Andreas Mattsson Systems Engineer MAX IV Laboratory Lund University P.O. Box 118, SE-221 00 Lund, Sweden Visiting address: Fotongatan 2, 224 84 Lund Mobile: +46 706 64 95 44 andreas.mattsson at maxiv.lu.se www.maxiv.se ________________________________ Fr?n: gpfsug-discuss-bounces at spectrumscale.org f?r Venkateswara R Puvvada Skickat: den 27 september 2019 10:23:13 Till: gpfsug main discussion list ?mne: Re: [gpfsug-discuss] afmRefreshAsync questions Hi, Both storage and client clusters have to be on 5.0.3.x to get the AFM revalidation performance with afmRefreshAsync. What are the refresh intervals ?, you could also try increasing them. Is this config option set at fileset level or cluster level ? ~Venkat (vpuvvada at in.ibm.com) From: Andreas Mattsson To: GPFS User Group Date: 09/26/2019 03:26 PM Subject: [EXTERNAL] [gpfsug-discuss] afmRefreshAsync questions Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi, Due to having a data analysis software that isn't running well at all in our AFM caches, it runs 4-6 times slower on an AFM cache than on a non-AFM fileset on the same storage system, I wanted to try out the afmRefreshAsync feature that came with 5.0.3 to see if it is the cache data refresh that is holding things up. Enabling this feature has had zero impact on performance of the software though. The storage cluster is running 5.0.3.x, and afmRefreshAsync has been set there, but at the moment the remote-mounting client cluster is still running 5.0.2.x. Would this feature still have any effect in this setup? Regards, Andreas Mattsson ____________________________________________ [cid:_4_DB7D1BA8DB7D1920002E115D65258482] Andreas Mattsson Systems Engineer MAX IV Laboratory Lund University P.O. Box 118, SE-221 00 Lund, Sweden Visiting address: Fotongatan 2, 224 84 Lund Mobile: +46 706 64 95 44 andreas.mattsson at maxiv.lu.se www.maxiv.se _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00001.png Type: image/png Size: 4232 bytes Desc: ATT00001.png URL: From leonardo.sala at psi.ch Tue Oct 1 08:03:05 2019 From: leonardo.sala at psi.ch (Leonardo Sala) Date: Tue, 1 Oct 2019 09:03:05 +0200 Subject: [gpfsug-discuss] Filesystem access issues via CES NFS In-Reply-To: References: <717f49aade0b439eb1b99fc620a21cac@maxiv.lu.se> <9456645b0a1f4b488b13874ea672b9b8@maxiv.lu.se> Message-ID: Dear all, we have similar issues on our CES cluster, and we do have 5.0.2-1. Could anybody from IBM confirm that with 5.0.2-2 this issue should not be there anymore? Should we go for 5.0.2-2 or is there a better release? One thing we noticed: when we had the "empty ls" issue, which means: - on CES NFSv3 export, a directory is wrongly reported as empty, while - on kernel NFS export, this does not happen if I do an ls on that directory on the CES export node, then magically the empty dir issue disappears from all NFS clients, at least the ones attached on that node. Is this compatible with the behaviour described on the other sites? thanks! cheers leo Paul Scherrer Institut Dr. Leonardo Sala Group Leader High Performance Computing Deputy Section Head Science IT Science IT WHGA/106 5232 Villigen PSI Switzerland Phone: +41 56 310 3369 leonardo.sala at psi.ch www.psi.ch On 04.01.19 10:09, Andreas Mattsson wrote: > > Just reporting back that the issue we had seems to have been solved. > In our case it was fixed by applying hotfix-packages from IBM. Did > this in December and I can no longer trigger the issue. Hopefully, > it'll stay fixed when we get full production load on the system again > now in January. > > Also, as far as I can see, it looks like Scale 5.0.2.2 includes these > packages already. > > > Regards, > > Andreas mattsson > -------------- next part -------------- An HTML attachment was scrubbed... URL: From heinrich.billich at id.ethz.ch Tue Oct 1 13:34:38 2019 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Tue, 1 Oct 2019 12:34:38 +0000 Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? In-Reply-To: <766AA5C3-46BD-4B91-9D1E-52BC5FAB90A8@id.ethz.ch> References: <766AA5C3-46BD-4B91-9D1E-52BC5FAB90A8@id.ethz.ch> Message-ID: Hello, I wanted to completely disable IPv6 to get ganesha to use IPv4 sockets only. Once we did set the sysctl configs to disable IPv6 *and* did rebuild the initramfs.*.img file to include the new settings IPv6 was completely gone and ganesha did open an IPv4 socket only. We missed to rebuild the initramfs.*.img file in the first trial. Rpcbind/ganesha failed to start without the initramfs rebuild. Cheers, Heiner Some related documents from netapp https://access.redhat.com/solutions/8709#?rhel7disable https://access.redhat.com/solutions/2798411 https://access.redhat.com/solutions/2963091 From: on behalf of "Billich Heinrich Rainer (ID SD)" Reply to: gpfsug main discussion list Date: Monday, 16 September 2019 at 17:51 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? Hello Olaf, Thank you, so we?ll try to get rid of IPv6. Actually we do have this settings active but I may have to add them to the initrd file, too. (See https://access.redhat.com/solutions/8709#?rhel7disable) to prevent ganesha from opening an IPv6 socket. It?s probably no big issue if ganesha uses IPv4overIPv6 for all connections, but to keep things simple I would like to avoid it. @Edward We got /etc/tuned/scale/tuned.conf with GSS/xCAT. I?m not sure whether it?s part of any rpm. Cheers, Heiner From: on behalf of Olaf Weiser Reply to: gpfsug main discussion list Date: Monday, 16 September 2019 at 09:12 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? Hallo Heiner, usually, Spectrum Scale comes with a tuned profile (named scale) .. [root at nsd01 ~]# tuned-adm active Current active profile: scale in there [root at nsd01 ~]# cat /etc/tuned/scale/tuned.conf | tail -3 # Disable IPv6 net.ipv6.conf.all.disable_ipv6=1 net.ipv6.conf.default.disable_ipv6=1 [root at nsd01 ~]# depending on .... what you need to achieve .. one might be forced to changed that.. e.g. for RoCE .. you need IPv6 to be active ... but for all other scenarios with SpectrumScale (at least what I'm aware of right now) ... IPv6 can be disabled... From: "Billich Heinrich Rainer (ID SD)" To: gpfsug main discussion list Date: 09/13/2019 05:02 PM Subject: [EXTERNAL] [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello, I just noted that our ganesha daemons offer IPv6 sockets only, IPv4 traffic gets encapsulated. But all traffic to samba is IPv4, smbd offers both IPv4 and IPv6 sockets. I just wonder whether this is to be expected? Protocols support IPv4 only, so why running on IPv6 sockets only for ganesha? Did we configure something wrong and should completely disable IPv6 on the kernel level Any comment is welcome Cheers, Heiner -- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== I did check with ss -l -t -4 ss -l -t -6 add -p to get the process name, too. do you get the same results on your ces nodes? [root at nas22ces04-i config_samples]# ss -l -t -4 State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 8192 *:gpfs *:* LISTEN 0 50 *:netbios-ssn *:* LISTEN 0 128 *:5355 *:* LISTEN 0 128 *:sunrpc *:* LISTEN 0 128 *:ssh *:* LISTEN 0 100 127.0.0.1:smtp *:* LISTEN 0 10 10.250.135.24:4379 *:* LISTEN 0 128 *:32765 *:* LISTEN 0 50 *:microsoft-ds *:* [root at nas22ces04-i config_samples]# ss -l -t -6 State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 128 :::32767 :::* LISTEN 0 128 :::32768 :::* LISTEN 0 128 :::32769 :::* LISTEN 0 128 :::2049 :::* LISTEN 0 128 :::5355 :::* LISTEN 0 50 :::netbios-ssn :::* LISTEN 0 128 :::sunrpc :::* LISTEN 0 128 :::ssh :::* LISTEN 0 128 :::32765 :::* LISTEN 0 50 :::microsoft-ds :::* _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Oct 1 16:15:00 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 1 Oct 2019 15:15:00 +0000 Subject: [gpfsug-discuss] verbsPortsOutOfOrder Message-ID: <139F1B36-A1EE-4D3C-A50A-1F15D8BCD242@bham.ac.uk> Hi, In mmdiag --config, we see ?verbsPortsOutOfOrder? as an unset option. Could anyone comment on what that might do and if it relates to the ordering that ?verbsPorts? are set? Thanks Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From TOMP at il.ibm.com Wed Oct 2 11:53:59 2019 From: TOMP at il.ibm.com (Tomer Perry) Date: Wed, 2 Oct 2019 13:53:59 +0300 Subject: [gpfsug-discuss] verbsPortsOutOfOrder In-Reply-To: <139F1B36-A1EE-4D3C-A50A-1F15D8BCD242@bham.ac.uk> References: <139F1B36-A1EE-4D3C-A50A-1F15D8BCD242@bham.ac.uk> Message-ID: Simon, It looks like its setting the Out Of Order MLX5 environmental parameter: https://docs.mellanox.com/display/MLNXOFEDv451010/Out-of-Order+%28OOO%29+Data+Placement+Experimental+Verbs Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 01/10/2019 18:17 Subject: [EXTERNAL] [gpfsug-discuss] verbsPortsOutOfOrder Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, In mmdiag --config, we see ?verbsPortsOutOfOrder? as an unset option. Could anyone comment on what that might do and if it relates to the ordering that ?verbsPorts? are set? Thanks Simon_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=mLPyKeOa1gNDrORvEXBgMw&m=dwtQhITjaULogq0l7wR3LfWDiy4R6tpPWq81EvnuA_o&s=LyZT2j0hkAP9pJTkYU40ZkexzkG6RFRqDcS9rSrapRc&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Wed Oct 2 18:02:06 2019 From: david_johnson at brown.edu (David Johnson) Date: Wed, 2 Oct 2019 13:02:06 -0400 Subject: [gpfsug-discuss] CIFS protocol access does not honor secondary groups Message-ID: After converting from clustered CIFS to CES protocols, we?ve noticed that SMB users can?t access files owned by groups that they are members of, unless that group happens to be their primary group. Have read the smb.conf man page, and don?t see anything obvious that would control this? What might we be missing? Thanks, ? ddj Dave Johnson Brown University CCV/CIS From frederik.ferner at diamond.ac.uk Wed Oct 2 19:41:14 2019 From: frederik.ferner at diamond.ac.uk (Frederik Ferner) Date: Wed, 2 Oct 2019 19:41:14 +0100 Subject: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is this unusual? In-Reply-To: <9D53BE88-A5FC-469F-9362-F2EC67E393B7@id.ethz.ch> References: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch> <9D53BE88-A5FC-469F-9362-F2EC67E393B7@id.ethz.ch> Message-ID: <0a5f042f-2715-c436-34a1-27c0ba529a70@diamond.ac.uk> Hello Heiner, very interesting, thanks. In our case we are seeing this problem on gpfs.nfs-ganesha-gpfs-2.5.3-ibm036.05.el7, so close to the version where you're seeing it. Frederik On 23/09/2019 10:33, Billich Heinrich Rainer (ID SD) wrote: > Hello Frederik, > > Thank you. I now see a similar behavior: Ganesha has 500k open files while the node is suspended since 2+hours. I would expect that some cleanup job does remove most of the open FD after a much shorter while. Our systems have an upper limit of 1M open files per process and these spectrum scale settings: > > ! maxFilesToCache 1048576 > ! maxStatCache 2097152 > > Our ganesha version is 2.5.3. (gpfs.nfs-ganesha-2.5.3-ibm036.10.el7). I don't see the issue with gpfs.nfs-ganesha-2.5.3-ibm030.01.el7. But this second cluster also has a different load pattern. > > I did also post my initial question to the ganesha mailing list and want to share the reply I've got from Daniel Gryniewicz. > > Cheers, > Heiner > > Daniel Gryniewicz > So, it's not impossible, based on the workload, but it may also be a bug. > > For global FDs (All NFSv3 and stateless NFSv4), we obviously cannot know > when the client closes the FD, and opening/closing all the time causes a > large performance hit. So, we cache open FDs. > > All handles in MDCACHE live on the LRU. This LRU is divided into 2 > levels. Level 1 is more active handles, and they can have open FDs. > Various operation can demote a handle to level 2 of the LRU. As part of > this transition, the global FD on that handle is closed. Handles that > are actively in use (have a refcount taken on them) are not eligible for > this transition, as the FD may be being used. > > We have a background thread that runs, and periodically does this > demotion, closing the FDs. This thread runs more often when the number > of open FDs is above FD_HwMark_Percent of the available number of FDs, > and runs constantly when the open FD count is above FD_Limit_Percent of > the available number of FDs. > > So, a heavily used server could definitely have large numbers of FDs > open. However, there have also, in the past, been bugs that would > either keep the FDs from being closed, or would break the accounting (so > they were closed, but Ganesha still thought they were open). You didn't > say what version of Ganesha you're using, so I can't tell if one of > those bugs apply. > > Daniel > > ?On 19.09.19, 16:37, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Frederik Ferner" wrote: > > Heiner, > > we are seeing similar issues with CES/ganesha NFS, in our case it > exclusively with NFSv3 clients. > > What is maxFilesToCache set to on your ganesha node(s)? In our case > ganesha was running into the limit of open file descriptors because > maxFilesToCache was set at a low default and for now we've increased it > to 1M. > > It seemed that ganesha was never releasing files even after clients > unmounted the file system. > > We've only recently made the change, so we'll see how much that improved > the situation. > > I thought we had a reproducer but after our recent change, I can now no > longer successfully reproduce the increase in open files not being released. > > Kind regards, > Frederik > > On 19/09/2019 15:20, Billich Heinrich Rainer (ID SD) wrote: > > Hello, > > > > Is it usual to see 200?000-400?000 open files for a single ganesha > > process? Or does this indicate that something ist wrong? > > > > We have some issues with ganesha (on spectrum scale protocol nodes) > > reporting NFS3ERR_IO in the log. I noticed that the affected nodes > > have a large number of open files, 200?000-400?000 open files per daemon > > (and 500 threads and about 250 client connections). Other nodes have > > 1?000 ? 10?000 open files by ganesha only and don?t show the issue. > > > > If someone could explain how ganesha decides which files to keep open > > and which to close that would help, too. As NFSv3 is stateless the > > client doesn?t open/close a file, it?s the server to decide when to > > close it? We do have a few NFSv4 clients, too. > > > > Are there certain access patterns that can trigger such a large number > > of open file? Maybe traversing and reading a large number of small files? > > > > Thank you, > > > > Heiner > > > > I did count the open files by counting the entries in /proc/ > ganesha>/fd/ . With several 100k entries I failed to do a ?ls -ls? to > > list all the symbolic links, hence I can?t relate the open files to > > different exports easily. > > > > I did post this to the ganesha mailing list, too. > > > > -- > > > > ======================= > > > > Heinrich Billich > > > > ETH Z?rich > > > > Informatikdienste > > > > Tel.: +41 44 632 72 56 > > > > heinrich.billich at id.ethz.ch > > > > ======================== > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > -- > This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. > Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. > Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. > Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Frederik Ferner Senior Computer Systems Administrator (storage) phone: +44 1235 77 8624 Diamond Light Source Ltd. mob: +44 7917 08 5110 Duty Sys Admin can be reached on x8596 (Apologies in advance for the lines below. Some bits are a legal requirement and I have no control over them.) From kkr at lbl.gov Thu Oct 3 01:01:39 2019 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Wed, 2 Oct 2019 17:01:39 -0700 Subject: [gpfsug-discuss] Slides from last US event, hosting and speaking at events and plan for next events Message-ID: Hi all, The slides from the UG event at NERSC/LBNL are making there way here: https://www.spectrumscaleug.org/presentations/ Most of them are already in place. Thanks to all who attended, presented and participated. It?s great when we have interactive discussions at these events. We?d like to ask you, as GPFS/Spectrum Scale users, to consider hosting a future UG event at your site or giving a site update. I?ve been asked *many times*, why aren?t there more site updates? So you tell me?is there a barrier that I?m not aware of? We?re a friendly group (really!) and want to hear about your successes and your failures. We all learn from each other. Let me know if you have any thoughts about this. As a reminder, there is an upcoming Australian event and 2 upcoming US events Australia ? Sydney October 18th https://www.spectrumscaleug.org/event/spectrum-scale-user-group-at-ibm-systems-technical-university-australia/ US ? NYC October 10th https://www.spectrumscaleug.org/event/spectrum-scale-nyc-user-meeting-2019/ ? SC19 at Denver November 17th - This year we will include a morning session for new users and lunch. Online agenda will be available soon. https://www.spectrumscaleug.org/event/spectrum-scale-user-group-meeting-sc19/ Any feedback for the agendas for these events, or in general, please let us know. Cheers, Kristy -------------- next part -------------- An HTML attachment was scrubbed... URL: From ruben.cremades at roche.com Thu Oct 3 08:18:03 2019 From: ruben.cremades at roche.com (Cremades, Ruben) Date: Thu, 3 Oct 2019 09:18:03 +0200 Subject: [gpfsug-discuss] verbsPortsOutOfOrder In-Reply-To: References: <139F1B36-A1EE-4D3C-A50A-1F15D8BCD242@bham.ac.uk> Message-ID: Thanks Tomer, I have opened TS002806998 Regards Ruben On Wed, Oct 2, 2019 at 12:54 PM Tomer Perry wrote: > Simon, > > It looks like its setting the Out Of Order MLX5 environmental parameter: > > *https://docs.mellanox.com/display/MLNXOFEDv451010/Out-of-Order+%28OOO%29+Data+Placement+Experimental+Verbs* > > > > > Regards, > > Tomer Perry > Scalable I/O Development (Spectrum Scale) > email: tomp at il.ibm.com > 1 Azrieli Center, Tel Aviv 67021, Israel > Global Tel: +1 720 3422758 > Israel Tel: +972 3 9188625 > Mobile: +972 52 2554625 > > > > > From: Simon Thompson > To: "gpfsug-discuss at spectrumscale.org" < > gpfsug-discuss at spectrumscale.org> > Date: 01/10/2019 18:17 > Subject: [EXTERNAL] [gpfsug-discuss] verbsPortsOutOfOrder > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi, > > In mmdiag --config, we see ?verbsPortsOutOfOrder? as an unset option. > Could anyone comment on what that might do and if it relates to the > ordering that ?verbsPorts? are set? > > Thanks > > Simon_______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Rub?n Cremades Science Infrastructure F.Hoffmann-La Roche Ltd. Bldg 254 / Room 04 - NBH01 Wurmisweg 4303 - Kaiseraugst Phone: +41-61-687 26 25 ruben.cremades at roche.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Oct 3 10:14:01 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 3 Oct 2019 09:14:01 +0000 Subject: [gpfsug-discuss] verbsPortsOutOfOrder In-Reply-To: References: <139F1B36-A1EE-4D3C-A50A-1F15D8BCD242@bham.ac.uk> Message-ID: Thanks Tomer. That makes sense, also not something I think we need to worry about ? I assume that relates to hypercube or dragonfly or some such though the Mellanox docs only say ?some topologies? Simon From: on behalf of "TOMP at il.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Wednesday, 2 October 2019 at 11:54 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] verbsPortsOutOfOrder Simon, It looks like its setting the Out Of Order MLX5 environmental parameter: https://docs.mellanox.com/display/MLNXOFEDv451010/Out-of-Order+%28OOO%29+Data+Placement+Experimental+Verbs Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 01/10/2019 18:17 Subject: [EXTERNAL] [gpfsug-discuss] verbsPortsOutOfOrder Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi, In mmdiag --config, we see ?verbsPortsOutOfOrder? as an unset option. Could anyone comment on what that might do and if it relates to the ordering that ?verbsPorts? are set? Thanks Simon_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Oct 3 10:17:15 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 3 Oct 2019 09:17:15 +0000 Subject: [gpfsug-discuss] CIFS protocol access does not honor secondary groups In-Reply-To: References: Message-ID: This works for us, so it's something that should work. It's probably related to the way your authentication is setup, we used to use custom from before IBM supporting AD+LDAP and we had to add entries for the group SID in the LDAP server also, but since moving to "supported" way of doing this, we don't think we need this anymore.. You might want to do some digging with the wbinfo command and see if groups/SIDs resolve both ways, but I'd suggest opening a PMR on this. You could also check what file-permissions look like with mmgetacl. In the past we've seen some funkiness where creator/owner isn't on/inherited, so if the user owns the file/directory but the permission is to the group rather than directly the user, they can create new files but then not read them afterwards (though other users in the group can). I forget the exact details as we worked a standard inheritable ACL that works for us __ Simon ?On 02/10/2019, 18:02, "gpfsug-discuss-bounces at spectrumscale.org on behalf of David Johnson" wrote: After converting from clustered CIFS to CES protocols, we?ve noticed that SMB users can?t access files owned by groups that they are members of, unless that group happens to be their primary group. Have read the smb.conf man page, and don?t see anything obvious that would control this? What might we be missing? Thanks, ? ddj Dave Johnson Brown University CCV/CIS _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From TOMP at il.ibm.com Thu Oct 3 10:44:32 2019 From: TOMP at il.ibm.com (Tomer Perry) Date: Thu, 3 Oct 2019 12:44:32 +0300 Subject: [gpfsug-discuss] verbsPortsOutOfOrder In-Reply-To: References: <139F1B36-A1EE-4D3C-A50A-1F15D8BCD242@bham.ac.uk> Message-ID: Simon, I believe that adaptive routing might also introduce out of order packets - but I would ask Mellanox as to when they recommend to use it. Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Simon Thompson To: gpfsug main discussion list Date: 03/10/2019 12:14 Subject: [EXTERNAL] Re: [gpfsug-discuss] verbsPortsOutOfOrder Sent by: gpfsug-discuss-bounces at spectrumscale.org Thanks Tomer. That makes sense, also not something I think we need to worry about ? I assume that relates to hypercube or dragonfly or some such though the Mellanox docs only say ?some topologies? Simon From: on behalf of "TOMP at il.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Wednesday, 2 October 2019 at 11:54 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] verbsPortsOutOfOrder Simon, It looks like its setting the Out Of Order MLX5 environmental parameter: https://docs.mellanox.com/display/MLNXOFEDv451010/Out-of-Order+%28OOO%29+Data+Placement+Experimental+Verbs Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 01/10/2019 18:17 Subject: [EXTERNAL] [gpfsug-discuss] verbsPortsOutOfOrder Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, In mmdiag --config, we see ?verbsPortsOutOfOrder? as an unset option. Could anyone comment on what that might do and if it relates to the ordering that ?verbsPorts? are set? Thanks Simon_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=mLPyKeOa1gNDrORvEXBgMw&m=rn4emIykuWgljnk6nj_Ay8TFU177BWp8qeaVAjmenfM&s=dO3QHcwm0oVHnHKGtdwIi2Q8mXWvL6JPmU7aVuRRMx0&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From andreas.mattsson at maxiv.lu.se Thu Oct 3 14:55:19 2019 From: andreas.mattsson at maxiv.lu.se (Andreas Mattsson) Date: Thu, 3 Oct 2019 13:55:19 +0000 Subject: [gpfsug-discuss] afmRefreshAsync questions In-Reply-To: References: , , Message-ID: <7476c598c32440f1bffe7d9e950c0965@maxiv.lu.se> After further investigaion, it seems like this XDS software is using memory mapped io when operating on the files. Is it possible that MMAP IO has a higher performance hit by AFM than regular file access? /Andreas ____________________________________________ [X] Andreas Mattsson Systems Engineer MAX IV Laboratory Lund University P.O. Box 118, SE-221 00 Lund, Sweden Visiting address: Fotongatan 2, 224 84 Lund Mobile: +46 706 64 95 44 andreas.mattsson at maxiv.lu.se www.maxiv.se ________________________________ Fr?n: gpfsug-discuss-bounces at spectrumscale.org f?r Andreas Mattsson Skickat: den 1 oktober 2019 08:33:35 Till: gpfsug main discussion list ?mne: Re: [gpfsug-discuss] afmRefreshAsync questions Hi, I've tried increasing all the refresh intervals, but even at 300 seconds, there is very little performance increase. The job runs in several steps, and gets held up at two places, as far as I can see. First at a kind of parallelisation step where about 1000-3000 files are created in the current working folder on a single compute node, and then at a step where lots of small output files are written on each of the compute nodes involved in the job. Comparing with running the same data set on a non-AFM cache fileset in the same storage system, it runs at least a factor 5 slower, even with really high refresh intervals. In the Scale documentation, it states that the afmRefreshAsync is only configurable cluster wide. Is it also configurable on a per-fileset level? https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.3/com.ibm.spectrum.scale.v5r03.doc/bl1adm_configurationparametersAFM.htm The software is XDS, http://xds.mpimf-heidelberg.mpg.de/ Unfortunately it is a closed source software, so it is not possible to adapt the software. Regards, Andreas Mattsson ____________________________________________ [X] Andreas Mattsson Systems Engineer MAX IV Laboratory Lund University P.O. Box 118, SE-221 00 Lund, Sweden Visiting address: Fotongatan 2, 224 84 Lund Mobile: +46 706 64 95 44 andreas.mattsson at maxiv.lu.se www.maxiv.se ________________________________ Fr?n: gpfsug-discuss-bounces at spectrumscale.org f?r Venkateswara R Puvvada Skickat: den 27 september 2019 10:23:13 Till: gpfsug main discussion list ?mne: Re: [gpfsug-discuss] afmRefreshAsync questions Hi, Both storage and client clusters have to be on 5.0.3.x to get the AFM revalidation performance with afmRefreshAsync. What are the refresh intervals ?, you could also try increasing them. Is this config option set at fileset level or cluster level ? ~Venkat (vpuvvada at in.ibm.com) From: Andreas Mattsson To: GPFS User Group Date: 09/26/2019 03:26 PM Subject: [EXTERNAL] [gpfsug-discuss] afmRefreshAsync questions Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi, Due to having a data analysis software that isn't running well at all in our AFM caches, it runs 4-6 times slower on an AFM cache than on a non-AFM fileset on the same storage system, I wanted to try out the afmRefreshAsync feature that came with 5.0.3 to see if it is the cache data refresh that is holding things up. Enabling this feature has had zero impact on performance of the software though. The storage cluster is running 5.0.3.x, and afmRefreshAsync has been set there, but at the moment the remote-mounting client cluster is still running 5.0.2.x. Would this feature still have any effect in this setup? Regards, Andreas Mattsson ____________________________________________ [cid:_4_DB7D1BA8DB7D1920002E115D65258482] Andreas Mattsson Systems Engineer MAX IV Laboratory Lund University P.O. Box 118, SE-221 00 Lund, Sweden Visiting address: Fotongatan 2, 224 84 Lund Mobile: +46 706 64 95 44 andreas.mattsson at maxiv.lu.se www.maxiv.se _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ATT00001.png Type: image/png Size: 4232 bytes Desc: ATT00001.png URL: From christof.schmitt at us.ibm.com Thu Oct 3 17:02:17 2019 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Thu, 3 Oct 2019 16:02:17 +0000 Subject: [gpfsug-discuss] =?utf-8?q?CIFS_protocol_access_does_not_honor_se?= =?utf-8?q?condary=09groups?= In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From mnaineni at in.ibm.com Thu Oct 3 18:15:04 2019 From: mnaineni at in.ibm.com (Malahal R Naineni) Date: Thu, 3 Oct 2019 17:15:04 +0000 Subject: [gpfsug-discuss] Filesystem access issues via CES NFS In-Reply-To: References: , <717f49aade0b439eb1b99fc620a21cac@maxiv.lu.se><9456645b0a1f4b488b13874ea672b9b8@maxiv.lu.se> Message-ID: An HTML attachment was scrubbed... URL: From mnaineni at in.ibm.com Thu Oct 3 18:31:34 2019 From: mnaineni at in.ibm.com (Malahal R Naineni) Date: Thu, 3 Oct 2019 17:31:34 +0000 Subject: [gpfsug-discuss] Filesystem access issues via CES NFS In-Reply-To: References: , <717f49aade0b439eb1b99fc620a21cac@maxiv.lu.se><9456645b0a1f4b488b13874ea672b9b8@maxiv.lu.se> Message-ID: An HTML attachment was scrubbed... URL: From will.schmied at stjude.org Thu Oct 3 19:59:22 2019 From: will.schmied at stjude.org (Schmied, Will) Date: Thu, 3 Oct 2019 18:59:22 +0000 Subject: [gpfsug-discuss] Job: HPC Storage Architect at St. Jude Message-ID: <277C9DAD-06A2-4BD9-906F-83BFDDCDD965@stjude.org> Happy almost Friday everyone, St. Jude Children?s Research Hospital (Memphis, TN) has recently posted a job opening for a HPC Storage Architect, a senior level position working primarily to operate and maintain multiple Spectrum Scale clusters in support of research and other HPC workloads. You can view the job posting, and begin your application, here: http://myjob.io/nd6qd You can find all jobs, and information about working at St. Jude, here: https://www.stjude.org/jobs/hospital.html Please feel free to contact me directly off list if you have any questions. I?ll also be at SC this year and hope to see you there. Thanks, Will ________________________________ Email Disclaimer: www.stjude.org/emaildisclaimer Consultation Disclaimer: www.stjude.org/consultationdisclaimer -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaineni at in.ibm.com Fri Oct 4 06:49:35 2019 From: mnaineni at in.ibm.com (Malahal R Naineni) Date: Fri, 4 Oct 2019 05:49:35 +0000 Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? In-Reply-To: References: , <766AA5C3-46BD-4B91-9D1E-52BC5FAB90A8@id.ethz.ch> Message-ID: An HTML attachment was scrubbed... URL: From leonardo.sala at psi.ch Fri Oct 4 07:32:42 2019 From: leonardo.sala at psi.ch (Leonardo Sala) Date: Fri, 4 Oct 2019 08:32:42 +0200 Subject: [gpfsug-discuss] Filesystem access issues via CES NFS In-Reply-To: References: <717f49aade0b439eb1b99fc620a21cac@maxiv.lu.se> <9456645b0a1f4b488b13874ea672b9b8@maxiv.lu.se> Message-ID: <08c2be8b-ece1-a7a4-606f-e63fe854a8b1@psi.ch> Dear Malahal, thanks for the answer. Concerning SSSD, we are also using it, should we use 5.0.2-PTF3? We would like to avoid using 5.0.2.2, as it has issues with recent RHEL 7.6 kernels [*] and we are impacted: do you suggest to use 5.0.3.3? cheers leo [*] https://www.ibm.com/support/pages/ibm-spectrum-scale-gpfs-releases-42313-or-later-and-5022-or-later-have-issues-where-kernel-crashes-rhel76-0 Paul Scherrer Institut Dr. Leonardo Sala Group Leader High Performance Computing Deputy Section Head Science IT Science IT WHGA/106 5232 Villigen PSI Switzerland Phone: +41 56 310 3369 leonardo.sala at psi.ch www.psi.ch On 03.10.19 19:15, Malahal R Naineni wrote: > >> @Malahal: Looks like you have written the netgroup caching code, > feel free to ask for further details if required. > Hi Ulrich, Ganesha uses innetgr() call for netgroup information and > sssd has too many issues in its implementation. Redhat said that they > are going to fix sssd synchronization issues in RHEL8. It is in my > plate to serialize innergr() call in Ganesha to match kernel NFS > server usage! I expect the sssd issue to give EACCESS/EPERM kind of > issue but not EINVAL though. > If you are using sssd, you must be getting into a sssd issue. > Ganesha?has a host-ip cache fix in 5.0.2 PTF3. Please make sure you > use ganesha version?V2.5.3-ibm030.01 if you are using netgroups > (shipped with 5.0.2 PTF3 but can be used with Scale 5.0.1 or later) > Regards, Malahal. > > ----- Original message ----- > From: Ulrich Sibiller > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug-discuss at spectrumscale.org > Cc: > Subject: Re: [gpfsug-discuss] Filesystem access issues via CES NFS > Date: Thu, Dec 13, 2018 7:32 PM > On 23.11.2018 14:41, Andreas Mattsson wrote: > > Yes, this is repeating. > > > > We?ve ascertained that it has nothing to do at all with file > operations on the GPFS side. > > > > Randomly throughout the filesystem mounted via NFS, ls or file > access will give > > > > ? > > > > ?> ls: reading directory /gpfs/filessystem/test/testdir: Invalid > argument > > > > ? > > > > Trying again later might work on that folder, but might fail > somewhere else. > > > > We have tried exporting the same filesystem via a standard > kernel NFS instead of the CES > > Ganesha-NFS, and then the problem doesn?t exist. > > > > So it is definitely related to the Ganesha NFS server, or its > interaction with the file system. > > ?> Will see if I can get a tcpdump of the issue. > > We see this, too. We cannot trigger it. Fortunately I have managed > to capture some logs with > debugging enabled. I have now dug into the ganesha 2.5.3 code and > I think the netgroup caching is > the culprit. > > Here some FULL_DEBUG output: > 2018-12-13 11:53:41 : epoch 0009008d : server1 : > gpfs.ganesha.nfsd-258762[work-250] > export_check_access :EXPORT :M_DBG :Check for address 1.2.3.4 for > export id 1 path /gpfsexport > 2018-12-13 11:53:41 : epoch 0009008d : server1 : > gpfs.ganesha.nfsd-258762[work-250] client_match > :EXPORT :M_DBG :Match V4: 0xcf7fe0 NETGROUP_CLIENT: netgroup1 > (options=421021e2root_squash ? , RWrw, > 3--, ---, TCP, ----, Manage_Gids ? , -- Deleg, anon_uid= ?-2, > anon_gid= ? ?-2, sys) > 2018-12-13 11:53:41 : epoch 0009008d : server1 : > gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get > :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain > 2018-12-13 11:53:41 : epoch 0009008d : server1 : > gpfs.ganesha.nfsd-258762[work-250] client_match > :EXPORT :M_DBG :Match V4: 0xcfe320 NETGROUP_CLIENT: netgroup2 > (options=421021e2root_squash ? , RWrw, > 3--, ---, TCP, ----, Manage_Gids ? , -- Deleg, anon_uid= ?-2, > anon_gid= ? ?-2, sys) > 2018-12-13 11:53:41 : epoch 0009008d : server1 : > gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get > :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain > 2018-12-13 11:53:41 : epoch 0009008d : server1 : > gpfs.ganesha.nfsd-258762[work-250] client_match > :EXPORT :M_DBG :Match V4: 0xcfe380 NETGROUP_CLIENT: netgroup3 > (options=421021e2root_squash ? , RWrw, > 3--, ---, TCP, ----, Manage_Gids ? , -- Deleg, anon_uid= ?-2, > anon_gid= ? ?-2, sys) > 2018-12-13 11:53:41 : epoch 0009008d : server1 : > gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get > :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain > 2018-12-13 11:53:41 : epoch 0009008d : server1 : > gpfs.ganesha.nfsd-258762[work-250] > export_check_access :EXPORT :M_DBG :EXPORT ?(options=03303002 ? ? > ? ? ? ? ?, ? ? , ? ?, > ?? ? ?, ? ? ? ? ? ? ? , -- Deleg, ? ? ? ? ? ? ? ?, ? ? ? ?) > 2018-12-13 11:53:41 : epoch 0009008d : server1 : > gpfs.ganesha.nfsd-258762[work-250] > export_check_access :EXPORT :M_DBG :EXPORT_DEFAULTS > (options=42102002root_squash ? , ----, 3--, ---, > TCP, ----, Manage_Gids ? , ? ? ? ? , anon_uid= ? ?-2, anon_gid= ? > ?-2, sys) > 2018-12-13 11:53:41 : epoch 0009008d : server1 : > gpfs.ganesha.nfsd-258762[work-250] > export_check_access :EXPORT :M_DBG :default options > (options=03303002root_squash ? , ----, 34-, UDP, > TCP, ----, No Manage_Gids, -- Deleg, anon_uid= ? ?-2, anon_gid= ? > ?-2, none, sys) > 2018-12-13 11:53:41 : epoch 0009008d : server1 : > gpfs.ganesha.nfsd-258762[work-250] > export_check_access :EXPORT :M_DBG :Final options > (options=42102002root_squash ? , ----, 3--, ---, > TCP, ----, Manage_Gids ? , -- Deleg, anon_uid= ? ?-2, anon_gid= ? > ?-2, sys) > 2018-12-13 11:53:41 : epoch 0009008d : server1 : > gpfs.ganesha.nfsd-258762[work-250] nfs_rpc_execute > :DISP :INFO :DISP: INFO: Client ::ffff:1.2.3.4 is not allowed to > access Export_Id 1 /gpfsexport, > vers=3, proc=18 > > The client "client1" is definitely a member of the "netgroup1". > But the NETGROUP_CLIENT lookups for > "netgroup2" and "netgroup3" can only happen if the netgroup > caching code reports that "client1" is > NOT a member of "netgroup1". > > I have also opened a support case at IBM for this. > > @Malahal: Looks like you have written the netgroup caching code, > feel free to ask for further > details if required. > > Kind regards, > > Ulrich Sibiller > > -- > Dipl.-Inf. Ulrich Sibiller ? ? ? ? ? science + computing ag > System Administration ? ? ? ? ? ? ? ? ? ?Hagellocher Weg 73 > ?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 72070 Tuebingen, Germany > https://atos.net/de/deutschland/sc > -- > Science + Computing AG > Vorstandsvorsitzender/Chairman of the board of management: > Dr. Martin Matzke > Vorstand/Board of Management: > Matthias Schempp, Sabine Hohenstein > Vorsitzender des Aufsichtsrats/ > Chairman of the Supervisory Board: > Philippe Miltin > Aufsichtsrat/Supervisory Board: > Martin Wibbe, Ursula Morgenstern > Sitz/Registered Office: Tuebingen > Registergericht/Registration Court: Stuttgart > Registernummer/Commercial Register No.: HRB 382196 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.schlipalius at pawsey.org.au Fri Oct 4 07:37:17 2019 From: chris.schlipalius at pawsey.org.au (Chris Schlipalius) Date: Fri, 04 Oct 2019 14:37:17 +0800 Subject: [gpfsug-discuss] 2019 October 18th Australian Spectrum Scale User Group event - last call for user case speakers Message-ID: Hello all, This is the final announcement for the Spectrum Scale Usergroup Sydney Australia on Friday the 18th October 2019. All current Australian Spectrum Scale User Group event details can be found here: http://bit.ly/2YOFQ3u Last call for user case speakers please ? let me know if you are available to speak at this Usergroup. Feel free to circulate this event link to all who may need it. Please reserve your tickets now as tickets for places will close soon. There are some great speakers and topics, for details please see the agenda on Eventbrite. This is a combined Spectrum Scale, Spectrum Archive, Spectrum Protect and Spectrum LSF event. We are looking forwards to a great Usergroup in Sydney. Thanks again to IBM for helping to arrange the venue and event booking. Best Regards, Chris Schlipalius IBM Champion 2019 Team Lead, Storage Infrastructure, Data & Visualisation, The Pawsey Supercomputing Centre (CSIRO) GPFSUGAUS at gmail.com From mnaineni at in.ibm.com Fri Oct 4 11:55:20 2019 From: mnaineni at in.ibm.com (Malahal R Naineni) Date: Fri, 4 Oct 2019 10:55:20 +0000 Subject: [gpfsug-discuss] Filesystem access issues via CES NFS In-Reply-To: <08c2be8b-ece1-a7a4-606f-e63fe854a8b1@psi.ch> References: <08c2be8b-ece1-a7a4-606f-e63fe854a8b1@psi.ch>, <717f49aade0b439eb1b99fc620a21cac@maxiv.lu.se> <9456645b0a1f4b488b13874ea672b9b8@maxiv.lu.se> Message-ID: An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Fri Oct 4 16:51:34 2019 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Fri, 4 Oct 2019 15:51:34 +0000 Subject: [gpfsug-discuss] Lenovo GSS Planned End-of-Support Message-ID: <8cde86c2-3277-1a3a-7f91-62199158f6c4@rutgers.edu> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi there, Anyone know for sure when Lenovo is planning to release it's last version of the GSS software for its GSS solutions? I figure someone might be sufficiently plugged into the development here. Thanks! - -- ____ || \\UTGERS, |----------------------*O*------------------------ ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark `' -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQST3OUUqPn4dxGCSm6Zv6Bp0RyxvgUCXZdqfgAKCRCZv6Bp0Ryx vuDHAJ9vO2/G6YLVbnoifliLDztMcVhENgCg01jB7VhZA9M85hKUe2FUOrKRios= =4iyR -----END PGP SIGNATURE----- From ncalimet at lenovo.com Fri Oct 4 16:59:03 2019 From: ncalimet at lenovo.com (Nicolas CALIMET) Date: Fri, 4 Oct 2019 15:59:03 +0000 Subject: [gpfsug-discuss] [External] Lenovo GSS Planned End-of-Support In-Reply-To: <8cde86c2-3277-1a3a-7f91-62199158f6c4@rutgers.edu> References: <8cde86c2-3277-1a3a-7f91-62199158f6c4@rutgers.edu> Message-ID: Ryan, If the question really is for how long GSS will be supported, then maintenance releases are on the roadmap till at least 2022 in principle. If otherwise you are referring to the latest GSS code levels, then GSS 3.4b has been released late August. Regards, - Nicolas -- Nicolas Calimet, PhD | HPC System Architect | Lenovo DCG | Meitnerstrasse 9, D-70563 Stuttgart, Germany | +49 71165690146 -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Ryan Novosielski Sent: Friday, October 4, 2019 17:52 To: gpfsug-discuss at spectrumscale.org Subject: [External] [gpfsug-discuss] Lenovo GSS Planned End-of-Support -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi there, Anyone know for sure when Lenovo is planning to release it's last version of the GSS software for its GSS solutions? I figure someone might be sufficiently plugged into the development here. Thanks! - -- ____ || \\UTGERS, |----------------------*O*------------------------ ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark `' -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQST3OUUqPn4dxGCSm6Zv6Bp0RyxvgUCXZdqfgAKCRCZv6Bp0Ryx vuDHAJ9vO2/G6YLVbnoifliLDztMcVhENgCg01jB7VhZA9M85hKUe2FUOrKRios= =4iyR -----END PGP SIGNATURE----- _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From novosirj at rutgers.edu Fri Oct 4 17:15:08 2019 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Fri, 4 Oct 2019 16:15:08 +0000 Subject: [gpfsug-discuss] [External] Lenovo GSS Planned End-of-Support In-Reply-To: References: <8cde86c2-3277-1a3a-7f91-62199158f6c4@rutgers.edu> Message-ID: <5228bcf4-fe1b-cfc7-e1aa-071131496011@rutgers.edu> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Yup, that's the question; thanks for the help. I'd heard a rumor that there was a 2020 date, and wanted to see if I could get any indication in particular as to whether that was true. Sounds like even if it's not 2022, it's probably not 2020. We're clear on the current version -- planning the upgrade at the moment . On 10/4/19 11:59 AM, Nicolas CALIMET wrote: > Ryan, > > If the question really is for how long GSS will be supported, then > maintenance releases are on the roadmap till at least 2022 in > principle. If otherwise you are referring to the latest GSS code > levels, then GSS 3.4b has been released late August. > > Regards, - Nicolas > - -- ____ || \\UTGERS, |----------------------*O*------------------------ ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark `' -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQST3OUUqPn4dxGCSm6Zv6Bp0RyxvgUCXZdwBAAKCRCZv6Bp0Ryx vjAWAJ9OGbVfhM0m+/NXCRzXo8raIj/tNwCeMtg0osqnl3l16J4TC3oZGw9xxk4= =utaK -----END PGP SIGNATURE----- From kkr at lbl.gov Fri Oct 4 21:53:20 2019 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Fri, 4 Oct 2019 13:53:20 -0700 Subject: [gpfsug-discuss] Quota via API anyway to avoid negative values? Message-ID: Hi, There is a flag with mmlsquota to prevent the potential of getting negative values back: -e Specifies that mmlsquota is to collect updated quota usage data from all nodes before displaying results. If -e is not specified, there is the potential to display negative usage values as the quota server may process a combination of up-to-date and back-level information. However, we are using the API to collectively show quotas across GPFS and non-GPFS filesystems via one user-driven command. We are getting negative values using the API. Does anyone know the -e equivalent for the API? https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.2/com.ibm.spectrum.scale.v4r22.doc/bl1adm_gpfs_quotactl.htm Thanks, Kristy -------------- next part -------------- An HTML attachment was scrubbed... URL: From vpuvvada at in.ibm.com Sat Oct 5 05:30:49 2019 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Sat, 5 Oct 2019 10:00:49 +0530 Subject: [gpfsug-discuss] afmRefreshAsync questions In-Reply-To: <7476c598c32440f1bffe7d9e950c0965@maxiv.lu.se> References: , , <7476c598c32440f1bffe7d9e950c0965@maxiv.lu.se> Message-ID: I would recommend opening a case, collect the default traces from both gateway and application (or protocol) nodes to check the RPC overhead. There should not be difference between mmap IO and regular IO for AFM filesets. Also note that refresh intervals are stored as part of inode and for the large number of file access it is possible that inodes are evicted as part of dcache shrinkage and next access to the same files might go to home for the revalidation. afmRefreshAsync option can be set at fleset level also. Looks like it is missing from the documentation, this will be corrected. ~Venkat (vpuvvada at in.ibm.com) From: Andreas Mattsson To: gpfsug main discussion list Date: 10/03/2019 07:25 PM Subject: [EXTERNAL] Re: [gpfsug-discuss] afmRefreshAsync questions Sent by: gpfsug-discuss-bounces at spectrumscale.org After further investigaion, it seems like this XDS software is using memory mapped io when operating on the files. Is it possible that MMAP IO has a higher performance hit by AFM than regular file access? /Andreas ____________________________________________ Andreas Mattsson Systems Engineer MAX IV Laboratory Lund University P.O. Box 118, SE-221 00 Lund, Sweden Visiting address: Fotongatan 2, 224 84 Lund Mobile: +46 706 64 95 44 andreas.mattsson at maxiv.lu.se www.maxiv.se Fr?n: gpfsug-discuss-bounces at spectrumscale.org f?r Andreas Mattsson Skickat: den 1 oktober 2019 08:33:35 Till: gpfsug main discussion list ?mne: Re: [gpfsug-discuss] afmRefreshAsync questions Hi, I've tried increasing all the refresh intervals, but even at 300 seconds, there is very little performance increase. The job runs in several steps, and gets held up at two places, as far as I can see. First at a kind of parallelisation step where about 1000-3000 files are created in the current working folder on a single compute node, and then at a step where lots of small output files are written on each of the compute nodes involved in the job. Comparing with running the same data set on a non-AFM cache fileset in the same storage system, it runs at least a factor 5 slower, even with really high refresh intervals. In the Scale documentation, it states that the afmRefreshAsync is only configurable cluster wide. Is it also configurable on a per-fileset level? https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.3/com.ibm.spectrum.scale.v5r03.doc/bl1adm_configurationparametersAFM.htm The software is XDS, http://xds.mpimf-heidelberg.mpg.de/ Unfortunately it is a closed source software, so it is not possible to adapt the software. Regards, Andreas Mattsson ____________________________________________ Andreas Mattsson Systems Engineer MAX IV Laboratory Lund University P.O. Box 118, SE-221 00 Lund, Sweden Visiting address: Fotongatan 2, 224 84 Lund Mobile: +46 706 64 95 44 andreas.mattsson at maxiv.lu.se www.maxiv.se Fr?n: gpfsug-discuss-bounces at spectrumscale.org f?r Venkateswara R Puvvada Skickat: den 27 september 2019 10:23:13 Till: gpfsug main discussion list ?mne: Re: [gpfsug-discuss] afmRefreshAsync questions Hi, Both storage and client clusters have to be on 5.0.3.x to get the AFM revalidation performance with afmRefreshAsync. What are the refresh intervals ?, you could also try increasing them. Is this config option set at fileset level or cluster level ? ~Venkat (vpuvvada at in.ibm.com) From: Andreas Mattsson To: GPFS User Group Date: 09/26/2019 03:26 PM Subject: [EXTERNAL] [gpfsug-discuss] afmRefreshAsync questions Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, Due to having a data analysis software that isn't running well at all in our AFM caches, it runs 4-6 times slower on an AFM cache than on a non-AFM fileset on the same storage system, I wanted to try out the afmRefreshAsync feature that came with 5.0.3 to see if it is the cache data refresh that is holding things up. Enabling this feature has had zero impact on performance of the software though. The storage cluster is running 5.0.3.x, and afmRefreshAsync has been set there, but at the moment the remote-mounting client cluster is still running 5.0.2.x. Would this feature still have any effect in this setup? Regards, Andreas Mattsson ____________________________________________ Andreas Mattsson Systems Engineer MAX IV Laboratory Lund University P.O. Box 118, SE-221 00 Lund, Sweden Visiting address: Fotongatan 2, 224 84 Lund Mobile: +46 706 64 95 44 andreas.mattsson at maxiv.lu.se www.maxiv.se _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=vrw7qt4uEH-dBuEZSxUvPQM-SJOC0diQptL6vnfxCQA&s=rbRvqgv05seDPo5wFgK2jlRkzvHtU7y7zoNQ3rDV0d0&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 4232 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 4232 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 4232 bytes Desc: not available URL: From st.graf at fz-juelich.de Mon Oct 7 08:22:02 2019 From: st.graf at fz-juelich.de (Stephan Graf) Date: Mon, 7 Oct 2019 09:22:02 +0200 Subject: [gpfsug-discuss] Quota via API anyway to avoid negative values? In-Reply-To: References: Message-ID: <9c1fcd81-d947-e857-ffc8-b68d17142bfb@fz-juelich.de> Hi Kristi, I just want to mention that we have a ticket right now at IBM because of negative quota values. In our case even the '-e' does not work: [root at justnsd01a ~]#? mmlsquota -j hpsadm -e largedata Block Limits | File Limits Filesystem type KB quota limit in_doubt grace | files quota limit in_doubt grace Remarks largedata FILESET -45853247616 536870912000 590558003200 0 none | 6 3000000 3300000 0 none The solution offered by support is to run a 'mmcheckquota'. we are still in discussion. Stephan On 10/4/19 10:53 PM, Kristy Kallback-Rose wrote: > Hi, > > There is a flag with mmlsquota to prevent the potential of getting > negative values back: > > -e > Specifies that mmlsquota is to collect updated quota usage data from all > nodes before displaying results. If -e is not specified, there is the > potential to display negative usage values as the quota server may > process a combination of up-to-date and back-level information. > > > However, we are using the API to collectively show quotas across GPFS > and non-GPFS filesystems via one user-driven command. We are getting > negative values using the API. Does anyone know the -e equivalent for > the API? > > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.2/com.ibm.spectrum.scale.v4r22.doc/bl1adm_gpfs_quotactl.htm > > Thanks, > Kristy > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Stephan Graf Juelich Supercomputing Centre Institute for Advanced Simulation Forschungszentrum Juelich GmbH 52425 Juelich, Germany Phone: +49-2461-61-6578 Fax: +49-2461-61-6656 E-mail: st.graf at fz-juelich.de WWW: http://www.fz-juelich.de/jsc/ -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5322 bytes Desc: S/MIME Cryptographic Signature URL: From jonathan.buzzard at strath.ac.uk Mon Oct 7 15:07:55 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Mon, 7 Oct 2019 14:07:55 +0000 Subject: [gpfsug-discuss] Large in doubt on fileset Message-ID: <75222a1da28a8b278c655863d1c1f634830f4435.camel@strath.ac.uk> I have a DSS-G system running 4.2.3-7, and on Friday afternoon became aware that there is a very large (at least I have never seen anything on this scale before) in doubt on a fileset. It has persisted over the weekend and is sitting at 17.5TB, with the fileset having a 150TB quota and only 82TB in use. There is a relatively large 26,500 files in doubt, though there is no quotas on file numbers for the fileset. This has come down from some 47,500 on Friday when the in doubt was a shade over 18TB. The largest in doubt I have seen in the past was in the order of a few hundred GB under very heavy write that went away very quickly after the writing stopped. There is no evidence of heavy writing going on in the file system so I am perplexed as to why the in doubt is remaining so high. Any thoughts as to what might be going on? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From pinto at scinet.utoronto.ca Mon Oct 7 15:24:38 2019 From: pinto at scinet.utoronto.ca (Jaime Pinto) Date: Mon, 7 Oct 2019 10:24:38 -0400 Subject: [gpfsug-discuss] Large in doubt on fileset In-Reply-To: <75222a1da28a8b278c655863d1c1f634830f4435.camel@strath.ac.uk> References: <75222a1da28a8b278c655863d1c1f634830f4435.camel@strath.ac.uk> Message-ID: <4b450056-f1fe-05ed-3bd7-cae4082b3694@scinet.utoronto.ca> We run DSS as well, also 4.2.x versions, and large indoubt entries are common on our file systems, much larger than what you are seeing, for USR, GRP and FILESET. It didn't use to be so bad on versions 3.4|3.5 in other IBM appliances (GSS, ESS), even DDN's or Cray G200. Under 4.x series the internal automatic mechanism to reconcile accounting seems very laggy by default, and I couldn't find (yet) a config parameter to adjust this. I stopped trying to understand why this happens. Our users are all subject to quotas, and can't wait indefinitely for this reconciliation. I just run mmcheckquota every 6 hours via a crontab. I hope version 5 is better. Will know in a couple of months. Jaime On 2019-10-07 10:07 a.m., Jonathan Buzzard wrote: > > I have a DSS-G system running 4.2.3-7, and on Friday afternoon became > aware that there is a very large (at least I have never seen anything > on this scale before) in doubt on a fileset. It has persisted over the > weekend and is sitting at 17.5TB, with the fileset having a 150TB quota > and only 82TB in use. > > There is a relatively large 26,500 files in doubt, though there is no > quotas on file numbers for the fileset. This has come down from some > 47,500 on Friday when the in doubt was a shade over 18TB. > > The largest in doubt I have seen in the past was in the order of a few > hundred GB under very heavy write that went away very quickly after the > writing stopped. > > There is no evidence of heavy writing going on in the file system so I > am perplexed as to why the in doubt is remaining so high. > > Any thoughts as to what might be going on? > > > JAB. > ************************************ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ************************************ --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 From TOMP at il.ibm.com Mon Oct 7 17:22:13 2019 From: TOMP at il.ibm.com (Tomer Perry) Date: Mon, 7 Oct 2019 19:22:13 +0300 Subject: [gpfsug-discuss] Large in doubt on fileset In-Reply-To: <4b450056-f1fe-05ed-3bd7-cae4082b3694@scinet.utoronto.ca> References: <75222a1da28a8b278c655863d1c1f634830f4435.camel@strath.ac.uk> <4b450056-f1fe-05ed-3bd7-cae4082b3694@scinet.utoronto.ca> Message-ID: Hi, The major change around 4.X in quotas was the introduction of dynamic shares. In the past, every client share request was for constant number of blocks ( 20 blocks by default). For high performing system, it wasn't enough sometime ( imagine 320M for nodes are writing at 20GB/s). So, dynamic shares means that a client node can request 10000 blocks etc. etc. ( it doesn't mean that the server will provide those...). OTOH, node failure will leave more "stale in doubt" capacity since the server don't know how much of the share was actually used. Imagine a client node getting 1024 blocks ( 16G), using 20M and crashing. >From the server perspective, there are 16G "unknown", now multiple that by multiple nodes... The only way to solve it is indeed to execute mmcheckquota - but as you probably know, its not cheap. So, do you experience large number of node expels/crashes etc. that might be related to that ( otherwise, it might be some other bug that needs to be fixed...). Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Jaime Pinto To: gpfsug-discuss at spectrumscale.org Date: 07/10/2019 17:40 Subject: [EXTERNAL] Re: [gpfsug-discuss] Large in doubt on fileset Sent by: gpfsug-discuss-bounces at spectrumscale.org We run DSS as well, also 4.2.x versions, and large indoubt entries are common on our file systems, much larger than what you are seeing, for USR, GRP and FILESET. It didn't use to be so bad on versions 3.4|3.5 in other IBM appliances (GSS, ESS), even DDN's or Cray G200. Under 4.x series the internal automatic mechanism to reconcile accounting seems very laggy by default, and I couldn't find (yet) a config parameter to adjust this. I stopped trying to understand why this happens. Our users are all subject to quotas, and can't wait indefinitely for this reconciliation. I just run mmcheckquota every 6 hours via a crontab. I hope version 5 is better. Will know in a couple of months. Jaime On 2019-10-07 10:07 a.m., Jonathan Buzzard wrote: > > I have a DSS-G system running 4.2.3-7, and on Friday afternoon became > aware that there is a very large (at least I have never seen anything > on this scale before) in doubt on a fileset. It has persisted over the > weekend and is sitting at 17.5TB, with the fileset having a 150TB quota > and only 82TB in use. > > There is a relatively large 26,500 files in doubt, though there is no > quotas on file numbers for the fileset. This has come down from some > 47,500 on Friday when the in doubt was a shade over 18TB. > > The largest in doubt I have seen in the past was in the order of a few > hundred GB under very heavy write that went away very quickly after the > writing stopped. > > There is no evidence of heavy writing going on in the file system so I > am perplexed as to why the in doubt is remaining so high. > > Any thoughts as to what might be going on? > > > JAB. > ************************************ TELL US ABOUT YOUR SUCCESS STORIES https://urldefense.proofpoint.com/v2/url?u=http-3A__www.scinethpc.ca_testimonials&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=mLPyKeOa1gNDrORvEXBgMw&m=2KzJ8YjgXm5NsAjcpquw6pMVJFbLUBZ-KEQb2oHFYqs&s=esG-w1Wj_wInSHpT5fEhqVQMqpR15ZXaGxoQmjOKdDc&e= ************************************ --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=mLPyKeOa1gNDrORvEXBgMw&m=2KzJ8YjgXm5NsAjcpquw6pMVJFbLUBZ-KEQb2oHFYqs&s=dxj6p74pt5iaKKn4KvMmMPyLcUD5C37HbIc2zX-iWgY&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Tue Oct 8 11:45:38 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 8 Oct 2019 10:45:38 +0000 Subject: [gpfsug-discuss] Large in doubt on fileset In-Reply-To: References: <75222a1da28a8b278c655863d1c1f634830f4435.camel@strath.ac.uk> <4b450056-f1fe-05ed-3bd7-cae4082b3694@scinet.utoronto.ca> Message-ID: <841c1fd793b4179ea8e27b88f3ed1c7e0f76cb4e.camel@strath.ac.uk> On Mon, 2019-10-07 at 19:22 +0300, Tomer Perry wrote: [SNIP] > > So, do you experience large number of node expels/crashes etc. that > might be related to that ( otherwise, it might be some other bug that > needs to be fixed...). > Not as far as I can determine. The logs show only 58 expels in the last six months and around 2/3rds of those where on essentially dormant nodes that where being used for development work on fixing issues with the xcat node deployment for the compute nodes (triggering an rinstall on a node that was up with GPFS mounted but actually doing nothing). I have done an mmcheckquota which didn't take long to complete and now I the "in doubt" is a more reasonable sub 10GB. I shall monitor what happens more closely in future. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From scale at us.ibm.com Tue Oct 8 14:15:48 2019 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Tue, 8 Oct 2019 09:15:48 -0400 Subject: [gpfsug-discuss] Quota via API anyway to avoid negative values? In-Reply-To: References: Message-ID: Kristy, there is no equivalent to the -e option in the quota API. If your application receives negative quota values it is suggested that you use the mmlsquota command with the -e option to obtain the most recent quota usage information, or run the mmcheckquota command. Using either the -e option to mmlsquota or the mmcheckquota is an IO intensive operation so it would be wise not to run the command when the system is heavily loaded. Note that using the mmcheckquota command does provide QoS options to mitigate the impact of the operation on the cluster. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Kristy Kallback-Rose To: gpfsug main discussion list Date: 10/04/2019 04:53 PM Subject: [EXTERNAL] [gpfsug-discuss] Quota via API anyway to avoid negative values? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, There is a flag with mmlsquota to prevent the potential of getting negative values back: -e Specifies that mmlsquota is to collect updated quota usage data from all nodes before displaying results. If -e is not specified, there is the potential to display negative usage values as the quota server may process a combination of up-to-date and back-level information. However, we are using the API to collectively show quotas across GPFS and non-GPFS filesystems via one user-driven command. We are getting negative values using the API. Does anyone know the -e equivalent for the API? https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.2/com.ibm.spectrum.scale.v4r22.doc/bl1adm_gpfs_quotactl.htm Thanks, Kristy_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=hdhTNLoVRkMglSs8c9Ho37FKFZUJrCmrXG5pXqjtFbE&s=wfHn6xg9_2qzVFdBAthevvEHreS934rP1w88f3jSFcs&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From TROPPENS at de.ibm.com Wed Oct 9 16:50:31 2019 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Wed, 9 Oct 2019 17:50:31 +0200 Subject: [gpfsug-discuss] =?utf-8?q?Fw=3A___Agenda_and_registration_link_/?= =?utf-8?q?/_Oct_10_-_Spectrum=09Scale_NYC_User_Meeting?= Message-ID: Reminder about the user meeting in NYC tomorrow. https://www.spectrumscaleug.org/event/spectrum-scale-nyc-user-meeting-2019/ -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 ----- Forwarded by Ulf Troppens/Germany/IBM on 09/10/2019 17:46 ----- From: "Ulf Troppens" To: gpfsug main discussion list Date: 20/09/2019 10:12 Subject: [EXTERNAL] [gpfsug-discuss] Agenda and registration link // Oct 10 - Spectrum Scale NYC User Meeting Sent by: gpfsug-discuss-bounces at spectrumscale.org Draft agenda and registration link are now available: https://www.spectrumscaleug.org/event/spectrum-scale-nyc-user-meeting-2019/ -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 ----- Forwarded by Ulf Troppens/Germany/IBM on 20/09/2019 09:37 ----- From: "Ulf Troppens" To: gpfsug main discussion list Date: 11/09/2019 14:27 Subject: [EXTERNAL] [gpfsug-discuss] Save the date: Oct 10 - Spectrum Scale NYC User Meeting Sent by: gpfsug-discuss-bounces at spectrumscale.org Greetings, NYU Langone and IBM will host a Spectrum Scale User Meeting on October 10. Many senior engineers of our development lab in Poughkeepsie will attend and present. Details with agenda, exact location and registration link will follow. Best Ulf -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=kZaabFheMr5-INuBtDMnDjxzZMuvvQ-K0cx1FAfh4lg&m=x_he-vxYPdTCut1I-gX7dq5MQmsSZA_1952yvpisLn0&s=ghgxcu8zRWQLv9DIXJ3-CX14SDFrx3hYKsjt-_IWZIM&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From damir.krstic at gmail.com Thu Oct 10 21:43:45 2019 From: damir.krstic at gmail.com (Damir Krstic) Date: Thu, 10 Oct 2019 15:43:45 -0500 Subject: [gpfsug-discuss] waiters and files causing waiters Message-ID: is it possible via some set of mmdiag --waiters or mmfsadm dump ? to figure out which files or directories access (whether it's read or write) is causing long-er waiters? in all my looking i have not been able to get that information out of various diagnostic commands. thanks, damir -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Thu Oct 10 22:26:35 2019 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Thu, 10 Oct 2019 17:26:35 -0400 Subject: [gpfsug-discuss] waiters and files causing waiters In-Reply-To: References: Message-ID: The short answer is there is no easy way to determine what file/directory a waiter may be related. Generally, it is not necessary to know the file/directory since a properly sized/configured cluster should not have long waiters occurring, unless there is some type of failure in the cluster. If you were to capture sufficient information across the cluster you might be able to work out the file/directory involved in a long waiter but it would take either trace, or combing through lots of internal data structures. It would be helpful to know more details about your cluster to provide suggestions for what may be causing the long waiters. I presume you are seeing them on a regular basis and would like to understand why they are occurring. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Damir Krstic To: gpfsug main discussion list Date: 10/10/2019 04:44 PM Subject: [EXTERNAL] [gpfsug-discuss] waiters and files causing waiters Sent by: gpfsug-discuss-bounces at spectrumscale.org is it possible via some set of mmdiag --waiters or mmfsadm dump ? to figure out which files or directories access (whether it's read or write) is causing long-er waiters? in all my looking i have not been able to get that information out of various diagnostic commands. thanks, damir_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=9T66XmHIdF5y7JaNmf28qRGIn35K4t-9H7vwGkDMjgo&s=ncg0MQla29iX--sQeAmcB2XqE3_7zSFGmhnDgj9s--w&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex at calicolabs.com Thu Oct 10 23:33:06 2019 From: alex at calicolabs.com (Alex Chekholko) Date: Thu, 10 Oct 2019 15:33:06 -0700 Subject: [gpfsug-discuss] waiters and files causing waiters In-Reply-To: References: Message-ID: If the waiters are on a compute node and there is not much user work running there, then the open files listed by lsof will probably be the culprits. On Thu, Oct 10, 2019 at 1:44 PM Damir Krstic wrote: > is it possible via some set of mmdiag --waiters or mmfsadm dump ? to > figure out which files or directories access (whether it's read or write) > is causing long-er waiters? > > in all my looking i have not been able to get that information out of > various diagnostic commands. > > thanks, > damir > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Fri Oct 11 00:05:16 2019 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Thu, 10 Oct 2019 23:05:16 +0000 Subject: [gpfsug-discuss] waiters and files causing waiters In-Reply-To: References: Message-ID: <7E19298C-2C28-48F9-BE89-F91B9EC66866@rutgers.edu> I?ll dig through my notes. I had a similar situation and an engineer taught me how to do it. It?s a bit involved though. Not something you?d bother with for something transient. -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Oct 10, 2019, at 16:44, Damir Krstic wrote: ? is it possible via some set of mmdiag --waiters or mmfsadm dump ? to figure out which files or directories access (whether it's read or write) is causing long-er waiters? in all my looking i have not been able to get that information out of various diagnostic commands. thanks, damir _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Fri Oct 11 17:07:30 2019 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Fri, 11 Oct 2019 16:07:30 +0000 Subject: [gpfsug-discuss] User Group Meeting at SC19 - Registration is Open! Message-ID: <9C59AEAC-C26D-47ED-9321-BCC6A58F2E05@nuance.com> Join us at SC19 for the user group meeting on Sunday November 17th at the Hyatt Regency in Denver! This year there will be a morning session for new users to Spectrum Scale. Afternoon portion will be a collection of updates from IBM and user/sponsor talks. Details Here: https://www.spectrumscaleug.org/event/spectrum-scale-user-group-meeting-sc19/ (watch here for agenda updates) You do need to pre-register here: http://www.ibm.com/events/2019/SC19_BC This year we will have a limited number of box lunches available for users, free of charge. We?ll also have WiFi access for the attendees - Huzzah! Many thanks to our sponsors: IBM, Starfish Software, Mark III Systems, and Lenovo for helping us make this event possible and free of charge to all attendees. Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From bamirzadeh at tower-research.com Fri Oct 11 18:04:08 2019 From: bamirzadeh at tower-research.com (Behrooz Amirzadeh) Date: Fri, 11 Oct 2019 13:04:08 -0400 Subject: [gpfsug-discuss] waiters and files causing waiters In-Reply-To: <7E19298C-2C28-48F9-BE89-F91B9EC66866@rutgers.edu> References: <7E19298C-2C28-48F9-BE89-F91B9EC66866@rutgers.edu> Message-ID: I think it depends on the type of deadlock. For example, if hung nodes are the cause of the deadlock. I don't think there will be any files to go after. I've seen that it is possible in certain cases but no guarantees. When the deadlock is detected you can look at the internaldump that gets created on the deadlock node, for example: ===== dump deadlock ===== Current time 2019-09-24_10:17:30-0400 Waiting 904.5729 sec since 10:02:25, on node aresnode7132, thread 3584968 SyncFSWorkerThread: on ThCond 0x18042226DB8 (LkObjCondvar), reason 'waiting for RO lock' Then you search in the same file for the ThCond further down. You'll most likely see that it is associated with a mutex ===== dump condvar ===== Current time 2019-09-24_10:17:32-0400 . . 'LkObjCondvar' at 0x18042226DB8 (0xFFFFC90042226DB8) (mutex 'InodeCacheObjMutex' at 0x18042226C08 (0xFFFFC90042226C08 PTR_OK)) waitCount 1 condvarEventWordP 0xFFFF880DB4AAF088 Then you'll search for the that mutex in the same file ===== dump selected_files ===== Current time 2019-09-24_10:17:32-0400 Files in stripe group gpfs0: Selected: LkObj::mostWanted: 0x18042226D80 lock_state=0x2000000000000000 xlock_state=0x0 lock_flags=0x11 OpenFile: 429E985A0BFE280A:000000008285ECBD:0000000000000000 @ 0x18042226BD8 cach 1 ref 1 hc 3 tc 6 mtx 0x18042226C08 Inode: valid eff token xw @ 0x18042226D80, ctMode xw seq 175 lock state [ xw ] x [] flags [ dmn wka ] writer 39912 hasWaiters 1 0 Mnode: valid eff token xw @ 0x18042226DD0, ctMode xw seq 175 DMAPI: invalid eff token nl @ 0x18042226D30, ctMode nl seq 174 SMBOpen: valid eff token (A: M D: ) @ 0x18042226C60, ctMode (A: M D: ) Flags 0x30 (pfro+pfxw) seq 175 lock state [ (nil) D: ] x [] flags [ ] SMBOpLk: valid eff token wf @ 0x18042226CD0, ctMode wf Flags 0x30 (pfro+pfxw) seq 175 BR: @ 0x18042226E30, ctMode nl Flags 0x10 (pfro) seq 175 treeP 0x18048C1EFB8 C btFastTrack 0 1 ranges mode RO/XW: BLK [0,INF] mode XW node <1335> Fcntl: @ 0x18042226E58, ctMode nl Flags 0x30 (pfro+pfxw) seq 175 treeP 0x1801EBA7EE8 C btFastTrack 0 1 ranges mode RO/XW: BLK [0,INF] mode XW node <1335> * inode 2189814973* snap 0 USERFILE nlink 1 genNum 0x2710E0CC mode 0200100644: -rw-r--r-- tmmgr node (other) metanode (me) fail+panic count -1 flags 0x0, remoteStart 0 remoteCnt 0 localCnt 0 lastFrom 65535 switchCnt 0 BRL nXLocksOrRelinquishes 6 vfsReference 1 dioCount 0 dioFlushNeeded 1 dioSkipCounter 0 dioReentryThreshold 0.000000 lastAllocLsn 0xB8740C5E metadataFlushCount 2, metadataFlushWaiters 0/0, metadataCommitVersion 1 bufferListCount 1 bufferListChangeCount 1 dirty status: dirty fileDirty 1 fileDirtyOrUncommitted 1 dirtiedSyncNum 81078 inodeValid 1 inodeDirtyCount 5 objectVersion 1 mtimeDirty 1 flushVersion 8983 mnodeChangeCount 1 dirtyDataBufs 1 block size code 5 (32 subblocksPerFileBlock) dataBytesPerFileBlock 4194304 fileSize 10213 synchedFileSize 0 indirectionLevel 1 atime 1569333733.493440000 mtime 1569333742.784833000 ctime 1569333742.784712266 crtime 1569333733.493440000 * owner uid 6572 gid 3047* If you were lucky and all of these were found you can get the inode and the uid/gid of the owner of the file. If you happen to catch it quick enough you'll be able to find the file with lsof. Otherwise later with an ilm policy run if the file has not been deleted by the user. Behrooz On Thu, Oct 10, 2019 at 7:05 PM Ryan Novosielski wrote: > I?ll dig through my notes. I had a similar situation and an engineer > taught me how to do it. It?s a bit involved though. Not something you?d > bother with for something transient. > > -- > ____ > || \\UTGERS, > |---------------------------*O*--------------------------- > ||_// the State | Ryan Novosielski - novosirj at rutgers.edu > || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus > || \\ of NJ | Office of Advanced Research Computing - MSB C630, > Newark > `' > > On Oct 10, 2019, at 16:44, Damir Krstic wrote: > > ? > is it possible via some set of mmdiag --waiters or mmfsadm dump ? to > figure out which files or directories access (whether it's read or write) > is causing long-er waiters? > > in all my looking i have not been able to get that information out of > various diagnostic commands. > > thanks, > damir > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Fri Oct 11 18:43:15 2019 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Fri, 11 Oct 2019 17:43:15 +0000 Subject: [gpfsug-discuss] Quotas and AFM Message-ID: <82FEC032-CC22-4535-A490-2FF35E0D625C@rutgers.edu> Does anyone have any good resources or experience with quotas and AFM caches? Our scenario is that we have an AFM home one one site, an AFM cache on another site, and then a client cluster on that remote site that mounts the cache. The AFM filesets are IW. One of them contains our home directories, which have a quota set on the home side. Quotas were disabled entirely on the cache side (I enabled them recently, but did not set them to anything). What I believe we?re running into is scary long AFM queues that are caused by people writing an amount that is over the home quota to the cache, but the cache is accepting it and then failing to sync back to the home because the user is at their hard limit. I believe we?re also seeing delays on unaffected users who are not over their quota, but that?s harder to tell. We have the AFM gateways poorly/not tuned, so that is likely interacting. Is there any way to make the quotas apparent to the cache cluster too, beyond setting a quota there as well, or do I just fundamentally misunderstand this in some other way? We really just want the quotas on the home cluster to be enforced everywhere, more or less. Thanks! -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' From S.J.Thompson at bham.ac.uk Fri Oct 11 20:56:04 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 11 Oct 2019 19:56:04 +0000 Subject: [gpfsug-discuss] Quotas and AFM Message-ID: Yes. When we ran AFM, we had exactly this issue. What would happen is that a user/fileset quota would be hit and a compute job would continue writing. This would eventually fill the AFM queue. If you were lucky you could stop and restart the queue and it would process other files from other users but inevitably we'd get back to the same state. The solution was to increase the quota at home to clear the queue, kill user workload and then reduce their quota again. At home we had replication of two so it wasn't straight forward to set the same quotas on cache, we could just about fudge it for user home directories but not for most of our project storage as we use dependent fileaet quotas. We also saw issues with data in inode at home as this doesn't work at AFM cache so it goes into a block. I've forgotten the exact issues around that now. So our experience was much like you describe. Simon ________________________________ From: on behalf of Ryan Novosielski Sent: Friday, 11 October 2019, 18:43 To: gpfsug main discussion list Subject: [gpfsug-discuss] Quotas and AFM Does anyone have any good resources or experience with quotas and AFM caches? Our scenario is that we have an AFM home one one site, an AFM cache on another site, and then a client cluster on that remote site that mounts the cache. The AFM filesets are IW. One of them contains our home directories, which have a quota set on the home side. Quotas were disabled entirely on the cache side (I enabled them recently, but did not set them to anything). What I believe we?re running into is scary long AFM queues that are caused by people writing an amount that is over the home quota to the cache, but the cache is accepting it and then failing to sync back to the home because the user is at their hard limit. I believe we?re also seeing delays on unaffected users who are not over their quota, but that?s harder to tell. We have the AFM gateways poorly/not tuned, so that is likely interacting. Is there any way to make the quotas apparent to the cache cluster too, beyond setting a quota there as well, or do I just fundamentally misunderstand this in some other way? We really just want the quotas on the home cluster to be enforced everywhere, more or less. Thanks! -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Fri Oct 11 21:05:15 2019 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Fri, 11 Oct 2019 20:05:15 +0000 Subject: [gpfsug-discuss] Quotas and AFM In-Reply-To: References: Message-ID: Do you know is there anything that prevents me from just setting the quotas the same on the IW cache, if there?s no way to inherit? For the case of the home directories, it?s simple, as they are all 100G with some exceptions, so a default user quota takes care of almost all of it. Luckily, that?s right now where our problem is, but we have the potential with other filesets later. I?m also wondering if you can confirm that I should /not/ need to be looking at people who are writing to the at home fileset, where the quotas are set, as a problem syncing TO the cache, e.g. they don?t add to the queue. I assume GPFS sees the over quota and just denies the write, yes? I originally thought the problem was in that direction and was totally perplexed about how it could be so stupid. ? -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Oct 11, 2019, at 15:56, Simon Thompson wrote: ? Yes. When we ran AFM, we had exactly this issue. What would happen is that a user/fileset quota would be hit and a compute job would continue writing. This would eventually fill the AFM queue. If you were lucky you could stop and restart the queue and it would process other files from other users but inevitably we'd get back to the same state. The solution was to increase the quota at home to clear the queue, kill user workload and then reduce their quota again. At home we had replication of two so it wasn't straight forward to set the same quotas on cache, we could just about fudge it for user home directories but not for most of our project storage as we use dependent fileaet quotas. We also saw issues with data in inode at home as this doesn't work at AFM cache so it goes into a block. I've forgotten the exact issues around that now. So our experience was much like you describe. Simon ________________________________ From: on behalf of Ryan Novosielski Sent: Friday, 11 October 2019, 18:43 To: gpfsug main discussion list Subject: [gpfsug-discuss] Quotas and AFM Does anyone have any good resources or experience with quotas and AFM caches? Our scenario is that we have an AFM home one one site, an AFM cache on another site, and then a client cluster on that remote site that mounts the cache. The AFM filesets are IW. One of them contains our home directories, which have a quota set on the home side. Quotas were disabled entirely on the cache side (I enabled them recently, but did not set them to anything). What I believe we?re running into is scary long AFM queues that are caused by people writing an amount that is over the home quota to the cache, but the cache is accepting it and then failing to sync back to the home because the user is at their hard limit. I believe we?re also seeing delays on unaffected users who are not over their quota, but that?s harder to tell. We have the AFM gateways poorly/not tuned, so that is likely interacting. Is there any way to make the quotas apparent to the cache cluster too, beyond setting a quota there as well, or do I just fundamentally misunderstand this in some other way? We really just want the quotas on the home cluster to be enforced everywhere, more or less. Thanks! -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Oct 11 21:10:20 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 11 Oct 2019 20:10:20 +0000 Subject: [gpfsug-discuss] Quotas and AFM In-Reply-To: References: , Message-ID: Yes just set the quotas the same on both. Or a default quota and have exceptions if that works in your case. But this was where I think the inode in file is an issue if you have a lot of small files as in the inode at home they don't consume quota I think but as they are in a data block at cache they do. So it might now be quite so straightforward. And yes writes at home just get out of space, it's the AFM cache that fails on the write back to home but then its in the queue and can block it. Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Ryan Novosielski Sent: Friday, October 11, 2019 9:05:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Quotas and AFM Do you know is there anything that prevents me from just setting the quotas the same on the IW cache, if there?s no way to inherit? For the case of the home directories, it?s simple, as they are all 100G with some exceptions, so a default user quota takes care of almost all of it. Luckily, that?s right now where our problem is, but we have the potential with other filesets later. I?m also wondering if you can confirm that I should /not/ need to be looking at people who are writing to the at home fileset, where the quotas are set, as a problem syncing TO the cache, e.g. they don?t add to the queue. I assume GPFS sees the over quota and just denies the write, yes? I originally thought the problem was in that direction and was totally perplexed about how it could be so stupid. ? -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Oct 11, 2019, at 15:56, Simon Thompson wrote: ? Yes. When we ran AFM, we had exactly this issue. What would happen is that a user/fileset quota would be hit and a compute job would continue writing. This would eventually fill the AFM queue. If you were lucky you could stop and restart the queue and it would process other files from other users but inevitably we'd get back to the same state. The solution was to increase the quota at home to clear the queue, kill user workload and then reduce their quota again. At home we had replication of two so it wasn't straight forward to set the same quotas on cache, we could just about fudge it for user home directories but not for most of our project storage as we use dependent fileaet quotas. We also saw issues with data in inode at home as this doesn't work at AFM cache so it goes into a block. I've forgotten the exact issues around that now. So our experience was much like you describe. Simon ________________________________ From: on behalf of Ryan Novosielski Sent: Friday, 11 October 2019, 18:43 To: gpfsug main discussion list Subject: [gpfsug-discuss] Quotas and AFM Does anyone have any good resources or experience with quotas and AFM caches? Our scenario is that we have an AFM home one one site, an AFM cache on another site, and then a client cluster on that remote site that mounts the cache. The AFM filesets are IW. One of them contains our home directories, which have a quota set on the home side. Quotas were disabled entirely on the cache side (I enabled them recently, but did not set them to anything). What I believe we?re running into is scary long AFM queues that are caused by people writing an amount that is over the home quota to the cache, but the cache is accepting it and then failing to sync back to the home because the user is at their hard limit. I believe we?re also seeing delays on unaffected users who are not over their quota, but that?s harder to tell. We have the AFM gateways poorly/not tuned, so that is likely interacting. Is there any way to make the quotas apparent to the cache cluster too, beyond setting a quota there as well, or do I just fundamentally misunderstand this in some other way? We really just want the quotas on the home cluster to be enforced everywhere, more or less. Thanks! -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Oct 11 21:21:59 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 11 Oct 2019 20:21:59 +0000 Subject: [gpfsug-discuss] Quotas and AFM In-Reply-To: References: , , Message-ID: Oh and I forgot. This only works if you precache th data from home. Otherwise the disk usage at cache is only what you cached, as you don't know what size it is from home. Unless something has changed recently at any rate. Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson Sent: Friday, October 11, 2019 9:10:20 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Quotas and AFM Yes just set the quotas the same on both. Or a default quota and have exceptions if that works in your case. But this was where I think the inode in file is an issue if you have a lot of small files as in the inode at home they don't consume quota I think but as they are in a data block at cache they do. So it might now be quite so straightforward. And yes writes at home just get out of space, it's the AFM cache that fails on the write back to home but then its in the queue and can block it. Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Ryan Novosielski Sent: Friday, October 11, 2019 9:05:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Quotas and AFM Do you know is there anything that prevents me from just setting the quotas the same on the IW cache, if there?s no way to inherit? For the case of the home directories, it?s simple, as they are all 100G with some exceptions, so a default user quota takes care of almost all of it. Luckily, that?s right now where our problem is, but we have the potential with other filesets later. I?m also wondering if you can confirm that I should /not/ need to be looking at people who are writing to the at home fileset, where the quotas are set, as a problem syncing TO the cache, e.g. they don?t add to the queue. I assume GPFS sees the over quota and just denies the write, yes? I originally thought the problem was in that direction and was totally perplexed about how it could be so stupid. ? -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Oct 11, 2019, at 15:56, Simon Thompson wrote: ? Yes. When we ran AFM, we had exactly this issue. What would happen is that a user/fileset quota would be hit and a compute job would continue writing. This would eventually fill the AFM queue. If you were lucky you could stop and restart the queue and it would process other files from other users but inevitably we'd get back to the same state. The solution was to increase the quota at home to clear the queue, kill user workload and then reduce their quota again. At home we had replication of two so it wasn't straight forward to set the same quotas on cache, we could just about fudge it for user home directories but not for most of our project storage as we use dependent fileaet quotas. We also saw issues with data in inode at home as this doesn't work at AFM cache so it goes into a block. I've forgotten the exact issues around that now. So our experience was much like you describe. Simon ________________________________ From: on behalf of Ryan Novosielski Sent: Friday, 11 October 2019, 18:43 To: gpfsug main discussion list Subject: [gpfsug-discuss] Quotas and AFM Does anyone have any good resources or experience with quotas and AFM caches? Our scenario is that we have an AFM home one one site, an AFM cache on another site, and then a client cluster on that remote site that mounts the cache. The AFM filesets are IW. One of them contains our home directories, which have a quota set on the home side. Quotas were disabled entirely on the cache side (I enabled them recently, but did not set them to anything). What I believe we?re running into is scary long AFM queues that are caused by people writing an amount that is over the home quota to the cache, but the cache is accepting it and then failing to sync back to the home because the user is at their hard limit. I believe we?re also seeing delays on unaffected users who are not over their quota, but that?s harder to tell. We have the AFM gateways poorly/not tuned, so that is likely interacting. Is there any way to make the quotas apparent to the cache cluster too, beyond setting a quota there as well, or do I just fundamentally misunderstand this in some other way? We really just want the quotas on the home cluster to be enforced everywhere, more or less. Thanks! -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From vpuvvada at in.ibm.com Mon Oct 14 06:11:21 2019 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Mon, 14 Oct 2019 10:41:21 +0530 Subject: [gpfsug-discuss] Quotas and AFM In-Reply-To: <82FEC032-CC22-4535-A490-2FF35E0D625C@rutgers.edu> References: <82FEC032-CC22-4535-A490-2FF35E0D625C@rutgers.edu> Message-ID: As of today AFM does not support replication or caching of the filesystem or fileset level metadata like quotas, replication factors etc.. , it only supports replication of user's metadata and data. Users have to make sure that same quotas are set at both cache and home clusters. An error message is logged (mmfs.log) at AFM cache gateway if the home have quotas exceeded, and the queue will be stuck until the quotas are increased at the home cluster. ~Venkat (vpuvvada at in.ibm.com) From: Ryan Novosielski To: gpfsug main discussion list Date: 10/11/2019 11:13 PM Subject: [EXTERNAL] [gpfsug-discuss] Quotas and AFM Sent by: gpfsug-discuss-bounces at spectrumscale.org Does anyone have any good resources or experience with quotas and AFM caches? Our scenario is that we have an AFM home one one site, an AFM cache on another site, and then a client cluster on that remote site that mounts the cache. The AFM filesets are IW. One of them contains our home directories, which have a quota set on the home side. Quotas were disabled entirely on the cache side (I enabled them recently, but did not set them to anything). What I believe we?re running into is scary long AFM queues that are caused by people writing an amount that is over the home quota to the cache, but the cache is accepting it and then failing to sync back to the home because the user is at their hard limit. I believe we?re also seeing delays on unaffected users who are not over their quota, but that?s harder to tell. We have the AFM gateways poorly/not tuned, so that is likely interacting. Is there any way to make the quotas apparent to the cache cluster too, beyond setting a quota there as well, or do I just fundamentally misunderstand this in some other way? We really just want the quotas on the home cluster to be enforced everywhere, more or less. Thanks! -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=v6Rlb90lfAveMK0img3_DIq6tq6dce4WXaxNhN0TDBQ&s=PNlMZJgKMhodVCByv07nOOiyF2Sr498Rd4NmIaOkL9g&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From vpuvvada at in.ibm.com Mon Oct 14 07:29:05 2019 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Mon, 14 Oct 2019 11:59:05 +0530 Subject: [gpfsug-discuss] Quotas and AFM In-Reply-To: References: , , Message-ID: As Simon already mentioned, set the similar quotas at both cache and home clusters to avoid the queue stuck problem due to quotas being exceeds home. >At home we had replication of two so it wasn't straight forward to set the same quotas on cache, we could just about fudge it for user home directories but not for most of our project storage as we use dependent fileaet >quotas. AFM will support dependent filesets from 5.0.4. Dependent filesets can be created at the cache in the independent fileset and set the same quotas from the home >We also saw issues with data in inode at home as this doesn't work at AFM cache so it goes into a block. I've forgotten the exact issues around that now. AFM uses some inode space to store the remote file attributes like file handle, file times etc.. as part of the EAs. If the file does not have hard links, maximum inode space used by the AFM is around 200 bytes. AFM cache can store the file's data in the inode if it have 200 bytes of more free space in the inode, otherwise file's data will be stored in subblock rather than using the full block. For example if the inode size is 4K at both cache and home, if the home file size is 3k and inode is using 300 bytes to store the file metadata, then free space in the inode at the home will be 724 bytes(4096 - (3072 + 300)). When this file is cached by the AFM , AFM adds internal EAs for 200 bytes, then the free space in the inode at the cache will be 524 bytes(4096 - (3072 + 300 + 200)). If the filesize is 3600 bytes at the home, AFM cannot store the data in the inode at the cache. So AFM stores the file data in the block only if it does not have enough space to store the internal EAs. ~Venkat (vpuvvada at in.ibm.com) From: Simon Thompson To: gpfsug main discussion list Date: 10/12/2019 01:52 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] Quotas and AFM Sent by: gpfsug-discuss-bounces at spectrumscale.org Oh and I forgot. This only works if you precache th data from home. Otherwise the disk usage at cache is only what you cached, as you don't know what size it is from home. Unless something has changed recently at any rate. Simon From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson Sent: Friday, October 11, 2019 9:10:20 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Quotas and AFM Yes just set the quotas the same on both. Or a default quota and have exceptions if that works in your case. But this was where I think the inode in file is an issue if you have a lot of small files as in the inode at home they don't consume quota I think but as they are in a data block at cache they do. So it might now be quite so straightforward. And yes writes at home just get out of space, it's the AFM cache that fails on the write back to home but then its in the queue and can block it. Simon From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Ryan Novosielski Sent: Friday, October 11, 2019 9:05:15 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Quotas and AFM Do you know is there anything that prevents me from just setting the quotas the same on the IW cache, if there?s no way to inherit? For the case of the home directories, it?s simple, as they are all 100G with some exceptions, so a default user quota takes care of almost all of it. Luckily, that?s right now where our problem is, but we have the potential with other filesets later. I?m also wondering if you can confirm that I should /not/ need to be looking at people who are writing to the at home fileset, where the quotas are set, as a problem syncing TO the cache, e.g. they don?t add to the queue. I assume GPFS sees the over quota and just denies the write, yes? I originally thought the problem was in that direction and was totally perplexed about how it could be so stupid. ? -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Oct 11, 2019, at 15:56, Simon Thompson wrote: ? Yes. When we ran AFM, we had exactly this issue. What would happen is that a user/fileset quota would be hit and a compute job would continue writing. This would eventually fill the AFM queue. If you were lucky you could stop and restart the queue and it would process other files from other users but inevitably we'd get back to the same state. The solution was to increase the quota at home to clear the queue, kill user workload and then reduce their quota again. At home we had replication of two so it wasn't straight forward to set the same quotas on cache, we could just about fudge it for user home directories but not for most of our project storage as we use dependent fileaet quotas. We also saw issues with data in inode at home as this doesn't work at AFM cache so it goes into a block. I've forgotten the exact issues around that now. So our experience was much like you describe. Simon From: on behalf of Ryan Novosielski Sent: Friday, 11 October 2019, 18:43 To: gpfsug main discussion list Subject: [gpfsug-discuss] Quotas and AFM Does anyone have any good resources or experience with quotas and AFM caches? Our scenario is that we have an AFM home one one site, an AFM cache on another site, and then a client cluster on that remote site that mounts the cache. The AFM filesets are IW. One of them contains our home directories, which have a quota set on the home side. Quotas were disabled entirely on the cache side (I enabled them recently, but did not set them to anything). What I believe we?re running into is scary long AFM queues that are caused by people writing an amount that is over the home quota to the cache, but the cache is accepting it and then failing to sync back to the home because the user is at their hard limit. I believe we?re also seeing delays on unaffected users who are not over their quota, but that?s harder to tell. We have the AFM gateways poorly/not tuned, so that is likely interacting. Is there any way to make the quotas apparent to the cache cluster too, beyond setting a quota there as well, or do I just fundamentally misunderstand this in some other way? We really just want the quotas on the home cluster to be enforced everywhere, more or less. Thanks! -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=FQMV8_Ivetm1R6_TcCWroPT58pjhPJgL39pgOdQEiqw&s=DfvksQLrKgv0OpK3Dr5pR-FUkhNddIvieh9_8h1jyGQ&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.ward at nhm.ac.uk Tue Oct 15 13:34:33 2019 From: p.ward at nhm.ac.uk (Paul Ward) Date: Tue, 15 Oct 2019 12:34:33 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs Message-ID: We are in the process of changing the way GPFS assigns UID/GIDs from internal tdb to using AD RIDs with an offset that matches our linux systems. We, therefore, need to change the ACLs for all the files in GPFS (up to 80 million). We are running in mixed ACL mode, with some POSIX and some NFSv4 ACLs being applied. (This system was set up 14 years ago and has changed roles over time) We are running on linux, so need to have POSIX permissions enabled. What I want to know for those in a similar environment, what do you have as the POSIX owner and group, when NFSv4 ACLs are in use? root:root or do you have all files owned by a filesystem administrator account and group: : on our samba shares we have : admin users = @ So don't actually need the group defined in POSIX. Kindest regards, Paul Paul Ward TS Infrastructure Architect Natural History Museum T: 02079426450 E: p.ward at nhm.ac.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Oct 15 13:51:55 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 15 Oct 2019 12:51:55 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs Message-ID: <531FC6DE-7928-4A4F-B444-DC9D1D78F705@bham.ac.uk> Hi Paul, We use both Windows and Linux with our FS but only have NFSv4 ACLs enabled (we do also set ?chmodAndSetAcl? on the fileset which makes chmod etc work whilst not breaking the ACL badly). We?ve only found 1 case where POSIX ACLs were needed, and really that was some other IBM software that didn?t understand ACLs (which is now fixed). The groups exist in both AD and our internal LDAP where they have gidNumbers assigned. For our research projects we set the following as the default on the directory: $ mmgetacl some-project #NFSv4 ACL #owner:root #group:gITS_BEAR_2019- some-project special:owner@:rwxc:allow:FileInherit:DirInherit (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED group:gITS_BEAR_2019- some-project:rwxc:allow:FileInherit:DirInherit (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED special:everyone@:----:allow (-)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (-)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED special:owner@:rwxc:allow (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED special:group@:rwx-:allow (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (X)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED Simon From: on behalf of Paul Ward Reply to: "gpfsug-discuss at spectrumscale.org" Date: Tuesday, 15 October 2019 at 13:34 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] default owner and group for POSIX ACLs We are in the process of changing the way GPFS assigns UID/GIDs from internal tdb to using AD RIDs with an offset that matches our linux systems. We, therefore, need to change the ACLs for all the files in GPFS (up to 80 million). We are running in mixed ACL mode, with some POSIX and some NFSv4 ACLs being applied. (This system was set up 14 years ago and has changed roles over time) We are running on linux, so need to have POSIX permissions enabled. What I want to know for those in a similar environment, what do you have as the POSIX owner and group, when NFSv4 ACLs are in use? root:root or do you have all files owned by a filesystem administrator account and group: : on our samba shares we have : admin users = @ So don?t actually need the group defined in POSIX. Kindest regards, Paul Paul Ward TS Infrastructure Architect Natural History Museum T: 02079426450 E: p.ward at nhm.ac.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Tue Oct 15 15:30:28 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 15 Oct 2019 14:30:28 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: Message-ID: On Tue, 2019-10-15 at 12:34 +0000, Paul Ward wrote: > We are in the process of changing the way GPFS assigns UID/GIDs from > internal tdb to using AD RIDs with an offset that matches our linux > systems. We, therefore, need to change the ACLs for all the files in > GPFS (up to 80 million). You do realize that will mean backing everything up again... > We are running in mixed ACL mode, with some POSIX and some NFSv4 ACLs > being applied. (This system was set up 14 years ago and has changed > roles over time) We are running on linux, so need to have POSIX > permissions enabled. We run on Linux and only have NFSv4 ACL's applied. I am not sure why you need POSIX ACL's if you are running Linux. Very very few applications will actually check ACL's or even for that matter permissions. They just do an fopen call or similar and the OS either goes yeah or neah, and the app needs to do something in the case of neah. > > What I want to know for those in a similar environment, what do you > have as the POSIX owner and group, when NFSv4 ACLs are in use? > root:root > > or do you have all files owned by a filesystem administrator account > and group: > : > > on our samba shares we have : > admin users = @ > So don?t actually need the group defined in POSIX. > Samba works much better with NFSv4 ACL's. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From S.J.Thompson at bham.ac.uk Tue Oct 15 16:41:35 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 15 Oct 2019 15:41:35 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: Message-ID: <31615908-BB7F-45C8-A9CC-CEA9D81CDE89@bham.ac.uk> I thought Spectrum Protect didn't actually backup again on a file owner change. Sure mmbackup considers it, but I think Protect just updates the metadata. There are also some other options for dsmc that can stop other similar issues if you change ctime maybe. (Other backup tools are available) Simon ?On 15/10/2019, 15:31, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" wrote: On Tue, 2019-10-15 at 12:34 +0000, Paul Ward wrote: > We are in the process of changing the way GPFS assigns UID/GIDs from > internal tdb to using AD RIDs with an offset that matches our linux > systems. We, therefore, need to change the ACLs for all the files in > GPFS (up to 80 million). You do realize that will mean backing everything up again... > We are running in mixed ACL mode, with some POSIX and some NFSv4 ACLs > being applied. (This system was set up 14 years ago and has changed > roles over time) We are running on linux, so need to have POSIX > permissions enabled. We run on Linux and only have NFSv4 ACL's applied. I am not sure why you need POSIX ACL's if you are running Linux. Very very few applications will actually check ACL's or even for that matter permissions. They just do an fopen call or similar and the OS either goes yeah or neah, and the app needs to do something in the case of neah. > > What I want to know for those in a similar environment, what do you > have as the POSIX owner and group, when NFSv4 ACLs are in use? > root:root > > or do you have all files owned by a filesystem administrator account > and group: > : > > on our samba shares we have : > admin users = @ > So don?t actually need the group defined in POSIX. > Samba works much better with NFSv4 ACL's. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From stockf at us.ibm.com Tue Oct 15 17:09:14 2019 From: stockf at us.ibm.com (Frederick Stock) Date: Tue, 15 Oct 2019 16:09:14 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: <31615908-BB7F-45C8-A9CC-CEA9D81CDE89@bham.ac.uk> References: <31615908-BB7F-45C8-A9CC-CEA9D81CDE89@bham.ac.uk>, Message-ID: An HTML attachment was scrubbed... URL: From p.ward at nhm.ac.uk Tue Oct 15 17:15:50 2019 From: p.ward at nhm.ac.uk (Paul Ward) Date: Tue, 15 Oct 2019 16:15:50 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: Message-ID: An amalgamated answer... > You do realize that will mean backing everything up again... From the tests that I have done, it appears not. A Spectrum protect incremental backup performs an 'update' when the ACL is changed via mmputacl or chown. when I do a backup after an mmputacl or chown ACL change on a migrated file, it isn't recalled, so it cant be backing up the file. If I do the same change from windows over a smb mount, it does cause the file to be recalled and backedup. > ...I am not sure why you need POSIX ACL's if you are running Linux... From what I have recently read... https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adm.doc/bl1adm_admnfsaclg.htm "Linux does not allow a file system to be NFS V4 exported unless it supports POSIX ACLs." As I said this system has had roles added to it. The original purpose was to only support NFS exports, then as a staging area for IT, as end user access wasn't needed, only POSIX permissions were used. No it has end user SMB mounts. >?chmodAndSetAcl? Saw this recently - will look at changing to that! https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adm.doc/bl1adm_authoriziefileprotocolusers.htm "To allow proper use of ACLs, it is recommended to prevent chmod from overwriting the ACLs by setting this parameter to setAclOnly or chmodAndSetAcl." >#owner:root OK so you do have root as the owner. > special:owner@:rwxc:allow:FileInherit:DirInherit And have it propagated to children. > group:gITS_BEAR_2019- some-project:rwxc:allow:FileInherit:DirInherit We by default assign two groups to a folder, a RW and R only. > special:everyone@:----:allow > special:owner@:rwxc:allow > special:group@:rwx-:allow I have been removing these. This seems to work, but was set via windows: POSIX: d--------- 2 root root 512 Apr 11 2019 #NFSv4 ACL #owner:root #group:root #ACL flags: # DACL_PRESENT # DACL_AUTO_INHERITED # SACL_AUTO_INHERITED # NULL_SACL group:dg--ro:r-x-:allow:FileInherit:DirInherit (X)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED group:dg--rwm:rwx-:allow:FileInherit:DirInherit (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED group:dl-:r-x-:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED So is root as the owner the norm? Kindest regards, Paul Paul Ward TS Infrastructure Architect Natural History Museum T: 02079426450 E: p.ward at nhm.ac.uk -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jonathan Buzzard Sent: 15 October 2019 15:30 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] default owner and group for POSIX ACLs On Tue, 2019-10-15 at 12:34 +0000, Paul Ward wrote: > We are in the process of changing the way GPFS assigns UID/GIDs from > internal tdb to using AD RIDs with an offset that matches our linux > systems. We, therefore, need to change the ACLs for all the files in > GPFS (up to 80 million). You do realize that will mean backing everything up again... > We are running in mixed ACL mode, with some POSIX and some NFSv4 ACLs > being applied. (This system was set up 14 years ago and has changed > roles over time) We are running on linux, so need to have POSIX > permissions enabled. We run on Linux and only have NFSv4 ACL's applied. I am not sure why you need POSIX ACL's if you are running Linux. Very very few applications will actually check ACL's or even for that matter permissions. They just do an fopen call or similar and the OS either goes yeah or neah, and the app needs to do something in the case of neah. > > What I want to know for those in a similar environment, what do you > have as the POSIX owner and group, when NFSv4 ACLs are in use? > root:root > > or do you have all files owned by a filesystem administrator account > and group: > : > > on our samba shares we have : > admin users = @ > So don?t actually need the group defined in POSIX. > Samba works much better with NFSv4 ACL's. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7Cp.ward%40nhm.ac.uk%7C54e024b8b52b4a70208e08d7517c47fc%7C73a29c014e78437fa0d4c8553e1960c1%7C1%7C0%7C637067466552637538&sdata=v43g1MEBnRBZP%2B5J7ORvywIq6poqhK24fTsCco0IEDo%3D&reserved=0 From p.ward at nhm.ac.uk Tue Oct 15 17:18:15 2019 From: p.ward at nhm.ac.uk (Paul Ward) Date: Tue, 15 Oct 2019 16:18:15 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: <31615908-BB7F-45C8-A9CC-CEA9D81CDE89@bham.ac.uk>, Message-ID: Hi Fred, From the tests I have done changing the ACL results in just an ?update? to when using Spectrum Protect, even on migrated files. Kindest regards, Paul Paul Ward TS Infrastructure Architect Natural History Museum T: 02079426450 E: p.ward at nhm.ac.uk From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Frederick Stock Sent: 15 October 2019 17:09 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] default owner and group for POSIX ACLs As I understand if you change only the POSIX attributes on a file then you are correct that TSM will only backup the file metadata, actually just the POSIX relevant metadata. However, if you change ACLs or other GPFS specific metadata then TSM will backup the entire file, TSM does not keep all file metadata separate from the actual file data. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com ----- Original message ----- From: Simon Thompson > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: [EXTERNAL] Re: [gpfsug-discuss] default owner and group for POSIX ACLs Date: Tue, Oct 15, 2019 11:41 AM I thought Spectrum Protect didn't actually backup again on a file owner change. Sure mmbackup considers it, but I think Protect just updates the metadata. There are also some other options for dsmc that can stop other similar issues if you change ctime maybe. (Other backup tools are available) Simon ?On 15/10/2019, 15:31, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" > wrote: On Tue, 2019-10-15 at 12:34 +0000, Paul Ward wrote: > We are in the process of changing the way GPFS assigns UID/GIDs from > internal tdb to using AD RIDs with an offset that matches our linux > systems. We, therefore, need to change the ACLs for all the files in > GPFS (up to 80 million). You do realize that will mean backing everything up again.... > We are running in mixed ACL mode, with some POSIX and some NFSv4 ACLs > being applied. (This system was set up 14 years ago and has changed > roles over time) We are running on linux, so need to have POSIX > permissions enabled. We run on Linux and only have NFSv4 ACL's applied. I am not sure why you need POSIX ACL's if you are running Linux. Very very few applications will actually check ACL's or even for that matter permissions. They just do an fopen call or similar and the OS either goes yeah or neah, and the app needs to do something in the case of neah. > > What I want to know for those in a similar environment, what do you > have as the POSIX owner and group, when NFSv4 ACLs are in use? > root:root > > or do you have all files owned by a filesystem administrator account > and group: > : > > on our samba shares we have : > admin users = @ > So don?t actually need the group defined in POSIX. > Samba works much better with NFSv4 ACL's. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org gpfsug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Tue Oct 15 17:49:34 2019 From: stockf at us.ibm.com (Frederick Stock) Date: Tue, 15 Oct 2019 16:49:34 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: , <31615908-BB7F-45C8-A9CC-CEA9D81CDE89@bham.ac.uk>, Message-ID: An HTML attachment was scrubbed... URL: From p.ward at nhm.ac.uk Tue Oct 15 19:27:01 2019 From: p.ward at nhm.ac.uk (Paul Ward) Date: Tue, 15 Oct 2019 18:27:01 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: , <31615908-BB7F-45C8-A9CC-CEA9D81CDE89@bham.ac.uk>, Message-ID: I have tested replacing POSIX with NFSv4, I have altered POSIX and altered NFSv4. The example below is NFSv4 changed to POSIX I have also tested on folders. Action Details Pre Changes File is backed up, migrated and has a nfsv4 ACL > ls -l ---------- 1 root 16777221 102400000 Sep 18 15:07 100mb-9.dat > dsmls 102400000 0 0 m 100mb-9.dat > dsmc q backup ?? -inac 102,400,000 B 09/18/2019 15:53:41 NHM_DATA_MC A /?/100mb-9.dat 102,400,000 B 09/18/2019 15:08:58 NHM_DATA_MC I /?/100mb-9.dat >mmgetacl #NFSv4 ACL #owner:root #group:16777221 group:1399645580:rwx-:allow:Inherited (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED group:16783540:rwx-:allow:Inherited (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED group:16777360:r-x-:allow:Inherited (X)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED group:1399621272:r-x-:allow:Inherited (X)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED Erase the nfsv4 acl chown root:root chmod 770 POSIX permissions changed and NFSv4 ACL gone > ls -l -rwxrwx--- 1 root root 102400000 Sep 18 15:07 100mb-9.dat > dsmls 102400000 0 0 m 100mb-9.dat > dsmc q backup ?? -inac 102,400,000 B 09/18/2019 15:53:41 NHM_DATA_MC A /?/100mb-9.dat 102,400,000 B 09/18/2019 15:08:58 NHM_DATA_MC I /?/100mb-9.dat >mmgetacl #owner:root #group:root user::rwxc group::rwx- other::---- Incremental backup Backup ?updates? the backup, but doesn?t transfer any data. dsmc incr "100mb-9.dat" IBM Tivoli Storage Manager Command Line Backup-Archive Client Interface Client Version 7, Release 1, Level 6.4 Client date/time: 10/15/2019 17:57:59 (c) Copyright by IBM Corporation and other(s) 1990, 2016. All Rights Reserved. Node Name: NHM-XXX-XXX Session established with server TSM-XXXXXX: Windows Server Version 7, Release 1, Level 7.0 Server date/time: 10/15/2019 17:57:58 Last access: 10/15/2019 17:57:52 Accessing as node: XXX-XXX Incremental backup of volume '100mb-9.dat' Updating--> 102,400,000 /?/100mb-9.dat [Sent] Successful incremental backup of '/?/100mb-9.dat' Total number of objects inspected: 1 Total number of objects backed up: 0 Total number of objects updated: 1 Total number of objects rebound: 0 Total number of objects deleted: 0 Total number of objects expired: 0 Total number of objects failed: 0 Total number of objects encrypted: 0 Total number of objects grew: 0 Total number of retries: 0 Total number of bytes inspected: 97.65 MB Total number of bytes transferred: 0 B Data transfer time: 0.00 sec Network data transfer rate: 0.00 KB/sec Aggregate data transfer rate: 0.00 KB/sec Objects compressed by: 0% Total data reduction ratio: 100.00% Elapsed processing time: 00:00:01 Post backup Active Backup timestamp hasn?t changed, and file is still migrated. > ls -l -rwxrwx--- 1 root root 102400000 Sep 18 15:07 100mb-9.dat > dsmls 102400000 0 0 m 100mb-9.dat > dsmc q backup ?? -inac 102,400,000 B 09/18/2019 15:53:41 NHM_DATA_MC A /?/100mbM/100mb-9.dat 102,400,000 B 09/18/2019 15:08:58 NHM_DATA_MC I /?/100mbM/100mb-9.dat >mmgetacl #owner:root #group:root user::rwxc group::rwx- other::---- Restore dsmc restore "100mb-9.dat" "100mb-9.dat.restore" IBM Tivoli Storage Manager Command Line Backup-Archive Client Interface Client Version 7, Release 1, Level 6.4 Client date/time: 10/15/2019 18:02:09 (c) Copyright by IBM Corporation and other(s) 1990, 2016. All Rights Reserved. Node Name: NHM-XXX-XXX Session established with server TSM-XXXXXX: Windows Server Version 7, Release 1, Level 7.0 Server date/time: 10/15/2019 18:02:08 Last access: 10/15/2019 18:02:07 Accessing as node: HSM-NHM Restore function invoked. Restoring 102,400,000 /?/100mb-9.dat --> /?/100mb-9.dat.restore [Done] Restore processing finished. Total number of objects restored: 1 Total number of objects failed: 0 Total number of bytes transferred: 97.66 MB Data transfer time: 1.20 sec Network data transfer rate: 83,317.88 KB/sec Aggregate data transfer rate: 689.11 KB/sec Elapsed processing time: 00:02:25 Restored file Restored file has the same permissions as the last backup > ls -l -rwxrwx--- 1 root root 102400000 Sep 18 15:07 100mb-9.dat.restore > dsmls 102400000 102400000 160 r 100mb-9.dat.restore > dsmc q backup ?? -inac ANS1092W No files matching search criteria were found >mmgetacl #owner:root #group:root user::rwxc group::rwx- other::---- I have just noticed: File backedup with POSIX ? restored file permissions POSIX File backedup with POSIX, changed to NFSv4 permissions, incremental backup ? restore file permissions POSIX File backedup with NFSv4, Changed to POSIX permissions, incremental backup ? restore file permissions POSIX File backedup with NFSv4, restore file permissions NFSv4 (there may be other variables involved) Kindest regards, Paul Paul Ward TS Infrastructure Architect Natural History Museum T: 02079426450 E: p.ward at nhm.ac.uk From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Frederick Stock Sent: 15 October 2019 17:50 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] default owner and group for POSIX ACLs Thanks Paul. Could you please clarify which ACL you changed, the GPFS NFSv4 ACL or the POSIX ACL? Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com ----- Original message ----- From: Paul Ward > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: [EXTERNAL] Re: [gpfsug-discuss] default owner and group for POSIX ACLs Date: Tue, Oct 15, 2019 12:18 PM Hi Fred, From the tests I have done changing the ACL results in just an ?update? to when using Spectrum Protect, even on migrated files. Kindest regards, Paul Paul Ward TS Infrastructure Architect Natural History Museum T: 02079426450 E: p.ward at nhm.ac.uk From: gpfsug-discuss-bounces at spectrumscale.org > On Behalf Of Frederick Stock Sent: 15 October 2019 17:09 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] default owner and group for POSIX ACLs As I understand if you change only the POSIX attributes on a file then you are correct that TSM will only backup the file metadata, actually just the POSIX relevant metadata. However, if you change ACLs or other GPFS specific metadata then TSM will backup the entire file, TSM does not keep all file metadata separate from the actual file data. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com ----- Original message ----- From: Simon Thompson > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: [EXTERNAL] Re: [gpfsug-discuss] default owner and group for POSIX ACLs Date: Tue, Oct 15, 2019 11:41 AM I thought Spectrum Protect didn't actually backup again on a file owner change. Sure mmbackup considers it, but I think Protect just updates the metadata. There are also some other options for dsmc that can stop other similar issues if you change ctime maybe. (Other backup tools are available) Simon ?On 15/10/2019, 15:31, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" > wrote: On Tue, 2019-10-15 at 12:34 +0000, Paul Ward wrote: > We are in the process of changing the way GPFS assigns UID/GIDs from > internal tdb to using AD RIDs with an offset that matches our linux > systems. We, therefore, need to change the ACLs for all the files in > GPFS (up to 80 million). You do realize that will mean backing everything up again.... > We are running in mixed ACL mode, with some POSIX and some NFSv4 ACLs > being applied. (This system was set up 14 years ago and has changed > roles over time) We are running on linux, so need to have POSIX > permissions enabled. We run on Linux and only have NFSv4 ACL's applied. I am not sure why you need POSIX ACL's if you are running Linux. Very very few applications will actually check ACL's or even for that matter permissions. They just do an fopen call or similar and the OS either goes yeah or neah, and the app needs to do something in the case of neah. > > What I want to know for those in a similar environment, what do you > have as the POSIX owner and group, when NFSv4 ACLs are in use? > root:root > > or do you have all files owned by a filesystem administrator account > and group: > : > > on our samba shares we have : > admin users = @ > So don?t actually need the group defined in POSIX. > Samba works much better with NFSv4 ACL's. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Oct 15 19:46:06 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 15 Oct 2019 18:46:06 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: , Message-ID: Only the top level of the project is root:root, not all files. The owner inherit is like CREATOROWNER in Windows, so the parent owner isn't inherited, but the permission inherits to newly created files. It was a while ago we worked out our permission defaults but without it we could have users create a file/directory but not be able to edit/change it as whilst the group had permission, the owner didn't. I should note we are all at 5.x code and not 4.2. Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Paul Ward Sent: Tuesday, October 15, 2019 5:15:50 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] default owner and group for POSIX ACLs An amalgamated answer... > You do realize that will mean backing everything up again... >From the tests that I have done, it appears not. A Spectrum protect incremental backup performs an 'update' when the ACL is changed via mmputacl or chown. when I do a backup after an mmputacl or chown ACL change on a migrated file, it isn't recalled, so it cant be backing up the file. If I do the same change from windows over a smb mount, it does cause the file to be recalled and backedup. > ...I am not sure why you need POSIX ACL's if you are running Linux... >From what I have recently read... https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adm.doc/bl1adm_admnfsaclg.htm "Linux does not allow a file system to be NFS V4 exported unless it supports POSIX ACLs." As I said this system has had roles added to it. The original purpose was to only support NFS exports, then as a staging area for IT, as end user access wasn't needed, only POSIX permissions were used. No it has end user SMB mounts. >?chmodAndSetAcl? Saw this recently - will look at changing to that! https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adm.doc/bl1adm_authoriziefileprotocolusers.htm "To allow proper use of ACLs, it is recommended to prevent chmod from overwriting the ACLs by setting this parameter to setAclOnly or chmodAndSetAcl." >#owner:root OK so you do have root as the owner. > special:owner@:rwxc:allow:FileInherit:DirInherit And have it propagated to children. > group:gITS_BEAR_2019- some-project:rwxc:allow:FileInherit:DirInherit We by default assign two groups to a folder, a RW and R only. > special:everyone@:----:allow > special:owner@:rwxc:allow > special:group@:rwx-:allow I have been removing these. This seems to work, but was set via windows: POSIX: d--------- 2 root root 512 Apr 11 2019 #NFSv4 ACL #owner:root #group:root #ACL flags: # DACL_PRESENT # DACL_AUTO_INHERITED # SACL_AUTO_INHERITED # NULL_SACL group:dg--ro:r-x-:allow:FileInherit:DirInherit (X)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED group:dg--rwm:rwx-:allow:FileInherit:DirInherit (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED group:dl-:r-x-:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED So is root as the owner the norm? Kindest regards, Paul Paul Ward TS Infrastructure Architect Natural History Museum T: 02079426450 E: p.ward at nhm.ac.uk -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jonathan Buzzard Sent: 15 October 2019 15:30 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] default owner and group for POSIX ACLs On Tue, 2019-10-15 at 12:34 +0000, Paul Ward wrote: > We are in the process of changing the way GPFS assigns UID/GIDs from > internal tdb to using AD RIDs with an offset that matches our linux > systems. We, therefore, need to change the ACLs for all the files in > GPFS (up to 80 million). You do realize that will mean backing everything up again... > We are running in mixed ACL mode, with some POSIX and some NFSv4 ACLs > being applied. (This system was set up 14 years ago and has changed > roles over time) We are running on linux, so need to have POSIX > permissions enabled. We run on Linux and only have NFSv4 ACL's applied. I am not sure why you need POSIX ACL's if you are running Linux. Very very few applications will actually check ACL's or even for that matter permissions. They just do an fopen call or similar and the OS either goes yeah or neah, and the app needs to do something in the case of neah. > > What I want to know for those in a similar environment, what do you > have as the POSIX owner and group, when NFSv4 ACLs are in use? > root:root > > or do you have all files owned by a filesystem administrator account > and group: > : > > on our samba shares we have : > admin users = @ > So don?t actually need the group defined in POSIX. > Samba works much better with NFSv4 ACL's. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7Cp.ward%40nhm.ac.uk%7C54e024b8b52b4a70208e08d7517c47fc%7C73a29c014e78437fa0d4c8553e1960c1%7C1%7C0%7C637067466552637538&sdata=v43g1MEBnRBZP%2B5J7ORvywIq6poqhK24fTsCco0IEDo%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Oct 15 19:50:54 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 15 Oct 2019 18:50:54 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: , <31615908-BB7F-45C8-A9CC-CEA9D81CDE89@bham.ac.uk>, , Message-ID: Fred, I thought like you that an ACL change caused a backup with mmbackup. Maybe only if you change the NFSv4 ACL. I'm sure it's documented somewhere and there is a flag to Protect to stop this from happening. Maybe a POSIX permission (setfacl style) doesn't trigger a backup. This would tie in with Paul's suggestion that changing via SMB caused the backup to occur. Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of stockf at us.ibm.com Sent: Tuesday, October 15, 2019 5:49:34 PM To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] default owner and group for POSIX ACLs Thanks Paul. Could you please clarify which ACL you changed, the GPFS NFSv4 ACL or the POSIX ACL? Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com ----- Original message ----- From: Paul Ward Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [EXTERNAL] Re: [gpfsug-discuss] default owner and group for POSIX ACLs Date: Tue, Oct 15, 2019 12:18 PM Hi Fred, From the tests I have done changing the ACL results in just an ?update? to when using Spectrum Protect, even on migrated files. Kindest regards, Paul Paul Ward TS Infrastructure Architect Natural History Museum T: 02079426450 E: p.ward at nhm.ac.uk From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Frederick Stock Sent: 15 October 2019 17:09 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] default owner and group for POSIX ACLs As I understand if you change only the POSIX attributes on a file then you are correct that TSM will only backup the file metadata, actually just the POSIX relevant metadata. However, if you change ACLs or other GPFS specific metadata then TSM will backup the entire file, TSM does not keep all file metadata separate from the actual file data. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com ----- Original message ----- From: Simon Thompson > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: [EXTERNAL] Re: [gpfsug-discuss] default owner and group for POSIX ACLs Date: Tue, Oct 15, 2019 11:41 AM I thought Spectrum Protect didn't actually backup again on a file owner change. Sure mmbackup considers it, but I think Protect just updates the metadata. There are also some other options for dsmc that can stop other similar issues if you change ctime maybe. (Other backup tools are available) Simon ?On 15/10/2019, 15:31, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" > wrote: On Tue, 2019-10-15 at 12:34 +0000, Paul Ward wrote: > We are in the process of changing the way GPFS assigns UID/GIDs from > internal tdb to using AD RIDs with an offset that matches our linux > systems. We, therefore, need to change the ACLs for all the files in > GPFS (up to 80 million). You do realize that will mean backing everything up again.... > We are running in mixed ACL mode, with some POSIX and some NFSv4 ACLs > being applied. (This system was set up 14 years ago and has changed > roles over time) We are running on linux, so need to have POSIX > permissions enabled. We run on Linux and only have NFSv4 ACL's applied. I am not sure why you need POSIX ACL's if you are running Linux. Very very few applications will actually check ACL's or even for that matter permissions. They just do an fopen call or similar and the OS either goes yeah or neah, and the app needs to do something in the case of neah. > > What I want to know for those in a similar environment, what do you > have as the POSIX owner and group, when NFSv4 ACLs are in use? > root:root > > or do you have all files owned by a filesystem administrator account > and group: > : > > on our samba shares we have : > admin users = @ > So don?t actually need the group defined in POSIX. > Samba works much better with NFSv4 ACL's. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org gpfsug.org _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Tue Oct 15 21:34:34 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 15 Oct 2019 20:34:34 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: Message-ID: On 15/10/2019 17:15, Paul Ward wrote: [SNIP] >> ...I am not sure why you need POSIX ACL's if you are running Linux... > From what I have recently read... > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adm.doc/bl1adm_admnfsaclg.htm > "Linux does not allow a file system to be NFS V4 exported unless it supports POSIX ACLs." > Only if you are using the inbuilt kernel NFS server, which IMHO is awful from a management perspective. That is you have zero visibility into what the hell it is doing when it all goes pear shaped unless you break out dtrace. I am not sure that using dtrace on a production service to find out what is going on is "best practice". It also in my experience stops you cleanly shutting down most of the time. The sooner it gets removed from the kernel the better IMHO. If you are using protocol nodes which is the only supported option as far as I am aware then that does not apply. I would imagined if you are rolling your own Ganesha NFS server it won't matter either. Checking the code of the FSAL in Ganesha shows functions for converting between GPFS ACL's and the ACL format as used by Ganesha. My understanding was one of the drivers for using Ganesha as an NFS server with GPFS was you can write a FSAL to do just that, in the same way as on Samba you load the vfs_gpfs module, unless you are into self flagellation I guess. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From YARD at il.ibm.com Wed Oct 16 05:41:39 2019 From: YARD at il.ibm.com (Yaron Daniel) Date: Wed, 16 Oct 2019 07:41:39 +0300 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: Message-ID: Hi In case you want to review with ls -l the POSIX permissions, please put the relevant permissions on the SMB share, and add CREATOROWNER & CREATETORGROUP. Than ls -l will show you the owner + group + everyone permissions. Regards Yaron Daniel 94 Em Ha'Moshavot Rd Storage Architect ? IL Lab Services (Storage) Petach Tiqva, 49527 IBM Global Markets, Systems HW Sales Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: yard at il.ibm.com Webex: https://ibm.webex.com/meet/yard IBM Israel From: Jonathan Buzzard To: "gpfsug-discuss at spectrumscale.org" Date: 15/10/2019 23:34 Subject: [EXTERNAL] Re: [gpfsug-discuss] default owner and group for POSIX ACLs Sent by: gpfsug-discuss-bounces at spectrumscale.org On 15/10/2019 17:15, Paul Ward wrote: [SNIP] >> ...I am not sure why you need POSIX ACL's if you are running Linux... > From what I have recently read... > https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adm.doc/bl1adm_admnfsaclg.htm > "Linux does not allow a file system to be NFS V4 exported unless it supports POSIX ACLs." > Only if you are using the inbuilt kernel NFS server, which IMHO is awful from a management perspective. That is you have zero visibility into what the hell it is doing when it all goes pear shaped unless you break out dtrace. I am not sure that using dtrace on a production service to find out what is going on is "best practice". It also in my experience stops you cleanly shutting down most of the time. The sooner it gets removed from the kernel the better IMHO. If you are using protocol nodes which is the only supported option as far as I am aware then that does not apply. I would imagined if you are rolling your own Ganesha NFS server it won't matter either. Checking the code of the FSAL in Ganesha shows functions for converting between GPFS ACL's and the ACL format as used by Ganesha. My understanding was one of the drivers for using Ganesha as an NFS server with GPFS was you can write a FSAL to do just that, in the same way as on Samba you load the vfs_gpfs module, unless you are into self flagellation I guess. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=Bn1XE9uK2a9CZQ8qKnJE3Q&m=b8w1GtIuT4M2ayhd-sZvIeIGVRrqM7QoXlh1KVj4Zq4&s=huFx7k3Vx10aZ-7AVq1HSVo825JPWVdFaEu3G3Dh-78&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1114 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 3847 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 4266 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 3747 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 3793 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 4301 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 3739 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 3855 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 4338 bytes Desc: not available URL: From mnaineni at in.ibm.com Wed Oct 16 09:21:46 2019 From: mnaineni at in.ibm.com (Malahal R Naineni) Date: Wed, 16 Oct 2019 08:21:46 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Wed Oct 16 09:25:22 2019 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Wed, 16 Oct 2019 08:25:22 +0000 Subject: [gpfsug-discuss] Spectrum Scale Erasure Code Edition (ECE) RedPaper Draft is public now Message-ID: An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Wed Oct 16 10:35:44 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 16 Oct 2019 09:35:44 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: , Message-ID: On Wed, 2019-10-16 at 08:21 +0000, Malahal R Naineni wrote: > >> Ganesha shows functions for converting between GPFS ACL's and the > ACL format as used by Ganesha. > > Ganesha only supports NFSv4 ACLs, so the conversion is a quick one. > kernel NFS server converts NFSv4 ACLs to POSIX ACLs (the mapping > isn't perfect) as many of the Linux file systems only support POSIX > ACLs (at least this was the behavior). > Yes but the point is you don't need POSIX ACL's on your file system if you are doing NFS exports if you use Ganesha as your NFS server and only do NFSv4 exports. It is then down to the client to deal with the ACL's which the Linux client does. In fact it has for as long as I can remember. There are even tools to manipulate the NFSv4 ACL's (see nfs4- acl-tools on RHEL and derivatives). What's missing is "rich ACL" support in the Linux kernel. www.bestbits.at/richacl/ which seems to be down at the moment. Though there has been activity on the user space utilities. https://github.com/andreas-gruenbacher/richacl/ Is it possible to get IBM to devote some resources to moving this along. It would make using GPFS on Linux with ACL's a more pleasant experience. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From p.ward at nhm.ac.uk Wed Oct 16 11:59:03 2019 From: p.ward at nhm.ac.uk (Paul Ward) Date: Wed, 16 Oct 2019 10:59:03 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: , Message-ID: We are running GPFS 4.2.3 with Arcpix build 3.5.10 or 3.5.12. We don't have Ganesha in the build. I'm not sure about the NFS service. Thanks for the responses, its interesting how the discussion has branched into Ganesha and what ACL changes are picked up by Spectrum Protect and mmbackup (my next major change). Any more responses on what is the best practice for the default POSIX owner and group of files and folders, when NFSv4 ACLs are used for SMB shares? Kindest regards, Paul Paul Ward TS Infrastructure Architect Natural History Museum T: 02079426450 E: p.ward at nhm.ac.uk -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Jonathan Buzzard Sent: 16 October 2019 10:36 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] default owner and group for POSIX ACLs On Wed, 2019-10-16 at 08:21 +0000, Malahal R Naineni wrote: >> Ganesha shows functions for converting between GPFS ACL's and the ACL format as used by Ganesha. Ganesha only supports NFSv4 ACLs, so the conversion is a quick one. kernel NFS server converts NFSv4 ACLs to POSIX ACLs (the mapping isn't perfect) as many of the Linux file systems only support POSIX ACLs (at least this was the behavior). Yes but the point is you don't need POSIX ACL's on your file system if you are doing NFS exports if you use Ganesha as your NFS server and only do NFSv4 exports. It is then down to the client to deal with the ACL's which the Linux client does. In fact it has for as long as I can remember. There are even tools to manipulate the NFSv4 ACL's (see nfs4- acl-tools on RHEL and derivatives). What's missing is "rich ACL" support in the Linux kernel. https://l.antigena.com/l/wElAOKB71BMteh5p3MJsrMJ1piEPqSzVv7jGE7WAADAaMiBDMV~~SJdC~qYZEePn7-JksRn9_H6cg21GWyrYE77TnWcAWsMEnF3Nwuug0tRR7ud7GDl9vPM3iafYImA3LyGuQInuXsXilJ6R9e2qmotMPRr~Lsq9CHJ2fsu1dBR1EL622lakpWuKLhjucFNsxUODYLWWFMzVbWj_AigKVAIMEX8Xqs0hGKXpOmjJOTejZDjM8bOCA1-jl06wU3DoT-ad3latFOtGR-oTHHwhAmu792L7Grmas12aetAuhTHnCQ6BBtRLGR_-iVJFYKfdyJNMVsDeKcBEBKKFSZdF~7ozqBouoIAZPE6cOA8KQIeh6mt1~_n which seems to be down at the moment. Though there has been activity on the user space utilities. https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fandreas-gruenbacher%2Frichacl%2F&data=02%7C01%7Cp.ward%40nhm.ac.uk%7C2c1e0145dadd4d35842508d7521c4b9c%7C73a29c014e78437fa0d4c8553e1960c1%7C1%7C0%7C637068153793755413&sdata=aUmCoKIC1N5TU95ILatCp2IlmdJ1gKKL8y%2F1V3kWb3M%3D&reserved=0 Is it possible to get IBM to devote some resources to moving this along. It would make using GPFS on Linux with ACL's a more pleasant experience. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7Cp.ward%40nhm.ac.uk%7C2c1e0145dadd4d35842508d7521c4b9c%7C73a29c014e78437fa0d4c8553e1960c1%7C1%7C0%7C637068153793755413&sdata=ZXLszye50npdSFIu1FuLK3eDbUd%2BV5h29xP1N3XD0jQ%3D&reserved=0 From stockf at us.ibm.com Wed Oct 16 12:14:46 2019 From: stockf at us.ibm.com (Frederick Stock) Date: Wed, 16 Oct 2019 11:14:46 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: , , Message-ID: An HTML attachment was scrubbed... URL: From TROPPENS at de.ibm.com Wed Oct 16 13:51:25 2019 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Wed, 16 Oct 2019 14:51:25 +0200 Subject: [gpfsug-discuss] Nov 5 - Spectrum Scale China User Meeting Message-ID: IBM will host a Spectrum Scale User Meeting on November 5 in Shanghai. Senior engineers of our development lab in Beijing will attend and present. Please register here: https://www.spectrumscaleug.org/event/spectrum-scale-china-user-meeting-2019/ -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at esquad.de Wed Oct 16 17:00:00 2019 From: lists at esquad.de (Dieter Mosbach) Date: Wed, 16 Oct 2019 18:00:00 +0200 Subject: [gpfsug-discuss] SMB support on ppc64LE / SLES for SpectrumScale - please vote for RFE Message-ID: <89482a10-bb53-4b49-d37f-7ef2efb28b30@esquad.de> We want to use smb-protocol-nodes for a HANA-SpectrumScale cluster, unfortunately these are only available for RHEL and not for SLES. SLES has a market share of 99% in the HANA environment. I have therefore created a Request for Enhancement (RFE). https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=137250 If you need it, too, please vote for it! Thank you very much! Kind regards Dieter -- Unix and Storage System Engineer HORNBACH-Baumarkt AG Bornheim, Germany From jonathan.buzzard at strath.ac.uk Wed Oct 16 22:32:50 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 16 Oct 2019 21:32:50 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: <31615908-BB7F-45C8-A9CC-CEA9D81CDE89@bham.ac.uk> References: <31615908-BB7F-45C8-A9CC-CEA9D81CDE89@bham.ac.uk> Message-ID: On 15/10/2019 16:41, Simon Thompson wrote: > I thought Spectrum Protect didn't actually backup again on a file > owner change. Sure mmbackup considers it, but I think Protect just > updates the metadata. There are also some other options for dsmc that > can stop other similar issues if you change ctime maybe. > > (Other backup tools are available) > It certainly used too. I spent six months carefully chown'ing files one user at a time so as not to overwhelm the backup, because the first group I did meant no backup for about a week... I have not kept a close eye on it and have just worked on the assumption for the last decade of "don't do that". If it is no longer the case I apologize for spreading incorrect information. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From skylar2 at uw.edu Wed Oct 16 22:46:48 2019 From: skylar2 at uw.edu (Skylar Thompson) Date: Wed, 16 Oct 2019 21:46:48 +0000 Subject: [gpfsug-discuss] default owner and group for POSIX ACLs In-Reply-To: References: <31615908-BB7F-45C8-A9CC-CEA9D81CDE89@bham.ac.uk> Message-ID: <20191016214648.pnmjmc65e6d4amqi@utumno.gs.washington.edu> On Wed, Oct 16, 2019 at 09:32:50PM +0000, Jonathan Buzzard wrote: > On 15/10/2019 16:41, Simon Thompson wrote: > > I thought Spectrum Protect didn't actually backup again on a file > > owner change. Sure mmbackup considers it, but I think Protect just > > updates the metadata. There are also some other options for dsmc that > > can stop other similar issues if you change ctime maybe. > > > > (Other backup tools are available) > > > > It certainly used too. I spent six months carefully chown'ing files one > user at a time so as not to overwhelm the backup, because the first > group I did meant no backup for about a week... > > I have not kept a close eye on it and have just worked on the assumption > for the last decade of "don't do that". If it is no longer the case I > apologize for spreading incorrect information. TSM can store some amount of metadata in its database without spilling over to a storage pool, so whether a metadata update is cheap or expensive depends not just on ACLs/extended attributes but also the directory entry name length. It can definitely make for some seemingly non-deterministic backup behavior. -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From jonathan.buzzard at strath.ac.uk Thu Oct 17 11:26:45 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Thu, 17 Oct 2019 10:26:45 +0000 Subject: [gpfsug-discuss] mmbackup questions Message-ID: <960c7d8ff66dacb199b36590f96e5faa815e5b06.camel@strath.ac.uk> I have been looking to give mmbackup another go (a very long history with it being a pile of steaming dinosaur droppings last time I tried, but that was seven years ago). Anyway having done a backup last night I am curious about something that does not appear to be explained in the documentation. Basically the output has a line like the following Total number of objects inspected: 474630 What is this number? Is it the number of files that have changed since the last backup or something else as it is not the number of files on the file system by any stretch of the imagination. One would hope that it inspected everything on the file system... Also it appears that the shadow database is held on the GPFS file system that is being backed up. Is there any way to change the location of that? I am only using one node for backup (because I am cheap and don't like paying for more PVU's than I need to) and would like to hold it on the node doing the backup where I can put it on SSD. Which does to things firstly hopefully goes a lot faster, and secondly reduces the impact on the file system of the backup. Anyway a significant speed up (assuming it worked) was achieved but I note even the ancient Xeon E3113 (dual core 3GHz) was never taxed (load average never went above one) and we didn't touch the swap despite only have 24GB of RAM. Though the 10GbE networking did get busy during the transfer of data to the TSM server bit of the backup but during the "assembly stage" it was all a bit quiet, and the DSS-G server nodes where not busy either. What options are there for tuning things because I feel it should be able to go a lot faster. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From stockf at us.ibm.com Thu Oct 17 13:35:18 2019 From: stockf at us.ibm.com (Frederick Stock) Date: Thu, 17 Oct 2019 12:35:18 +0000 Subject: [gpfsug-discuss] mmbackup questions In-Reply-To: <960c7d8ff66dacb199b36590f96e5faa815e5b06.camel@strath.ac.uk> References: <960c7d8ff66dacb199b36590f96e5faa815e5b06.camel@strath.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Oct 17 15:17:17 2019 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 17 Oct 2019 10:17:17 -0400 Subject: [gpfsug-discuss] mmbackup questions In-Reply-To: References: <960c7d8ff66dacb199b36590f96e5faa815e5b06.camel@strath.ac.uk> Message-ID: Along with what Fred wrote, you can look at the mmbackup doc and also peek into the script and find some options to look at the mmapplypolicy RULEs used, and also capture the mmapplypolicy output which will better show you which files and directories are being examined and so forth. --marc From: "Frederick Stock" To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Date: 10/17/2019 08:43 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] mmbackup questions Sent by: gpfsug-discuss-bounces at spectrumscale.org Jonathan the "objects inspected" refers to the number of file system objects that matched the policy rules used for the backup. These rules are influenced by TSM server and client settings, e.g. the dsm.sys file. So not all objects in the file system are actually inspected. As for tuning I think the mmbackup man page is the place to start, and I think it is thorough in its description of the tuning options. You may also want to look at the mmapplypolicy man page since mmbackup invokes it to scan the file system for files that need to be backed up. To my knowledge there are no options to place the shadow database file in another location than the GPFS file system. If the file system has fast storage I see no reason why you could not use a placement policy rule to place the shadow database on that fast storage. However, I think using more than one node for your backups, and adjusting the various threads used by mmbackup will provide you with sufficient performance improvements. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com ----- Original message ----- From: Jonathan Buzzard Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at spectrumscale.org" Cc: Subject: [EXTERNAL] [gpfsug-discuss] mmbackup questions Date: Thu, Oct 17, 2019 8:00 AM I have been looking to give mmbackup another go (a very long history with it being a pile of steaming dinosaur droppings last time I tried, but that was seven years ago). Anyway having done a backup last night I am curious about something that does not appear to be explained in the documentation. Basically the output has a line like the following Total number of objects inspected: 474630 What is this number? Is it the number of files that have changed since the last backup or something else as it is not the number of files on the file system by any stretch of the imagination. One would hope that it inspected everything on the file system... Also it appears that the shadow database is held on the GPFS file system that is being backed up. Is there any way to change the location of that? I am only using one node for backup (because I am cheap and don't like paying for more PVU's than I need to) and would like to hold it on the node doing the backup where I can put it on SSD. Which does to things firstly hopefully goes a lot faster, and secondly reduces the impact on the file system of the backup. Anyway a significant speed up (assuming it worked) was achieved but I note even the ancient Xeon E3113 (dual core 3GHz) was never taxed (load average never went above one) and we didn't touch the swap despite only have 24GB of RAM. Though the 10GbE networking did get busy during the transfer of data to the TSM server bit of the backup but during the "assembly stage" it was all a bit quiet, and the DSS-G server nodes where not busy either. What options are there for tuning things because I feel it should be able to go a lot faster. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8&m=u_URaXsFxbEw29QGkpa5CnXVGJApxske9lAtEPlerYY&s=mWDp7ziqYJ65-FSCOArzVITL9_qBunPqZ9uC9jgjxn8&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From skylar2 at uw.edu Thu Oct 17 15:26:03 2019 From: skylar2 at uw.edu (Skylar Thompson) Date: Thu, 17 Oct 2019 14:26:03 +0000 Subject: [gpfsug-discuss] mmbackup questions In-Reply-To: <960c7d8ff66dacb199b36590f96e5faa815e5b06.camel@strath.ac.uk> References: <960c7d8ff66dacb199b36590f96e5faa815e5b06.camel@strath.ac.uk> Message-ID: <20191017142603.r7dfwrexfnqilsu7@utumno.gs.washington.edu> On Thu, Oct 17, 2019 at 10:26:45AM +0000, Jonathan Buzzard wrote: > I have been looking to give mmbackup another go (a very long history > with it being a pile of steaming dinosaur droppings last time I tried, > but that was seven years ago). > > Anyway having done a backup last night I am curious about something > that does not appear to be explained in the documentation. > > Basically the output has a line like the following > > Total number of objects inspected: 474630 > > What is this number? Is it the number of files that have changed since > the last backup or something else as it is not the number of files on > the file system by any stretch of the imagination. One would hope that > it inspected everything on the file system... I believe this is the number of paths that matched some include rule (or didn't match some exclude rule) for mmbackup. I would assume it would differ from the "total number of objects backed up" line if there were include/exclude rules that mmbackup couldn't process, leaving it to dsmc to decide whether to process. > Also it appears that the shadow database is held on the GPFS file system > that is being backed up. Is there any way to change the location of that? > I am only using one node for backup (because I am cheap and don't like > paying for more PVU's than I need to) and would like to hold it on the > node doing the backup where I can put it on SSD. Which does to things > firstly hopefully goes a lot faster, and secondly reduces the impact on > the file system of the backup. I haven't tried it, but there is a MMBACKUP_RECORD_ROOT environment variable noted in the mmbackup man path: Specifies an alternative directory name for storing all temporary and permanent records for the backup. The directory name specified must be an existing directory and it cannot contain special characters (for example, a colon, semicolon, blank, tab, or comma). Which seems like it might provide a mechanism to store the shadow database elsewhere. For us, though, we provide storage via a cost center, so we would want our customers to eat the full cost of their excessive file counts. > Anyway a significant speed up (assuming it worked) was achieved but I > note even the ancient Xeon E3113 (dual core 3GHz) was never taxed (load > average never went above one) and we didn't touch the swap despite only > have 24GB of RAM. Though the 10GbE networking did get busy during the > transfer of data to the TSM server bit of the backup but during the > "assembly stage" it was all a bit quiet, and the DSS-G server nodes where > not busy either. What options are there for tuning things because I feel > it should be able to go a lot faster. We have some TSM nodes (corresponding to GPFS filesets) that stress out our mmbackup cluster at the sort step of mmbackup. UNIX sort is not RAM-friendly, as it happens. -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From jonathan.buzzard at strath.ac.uk Thu Oct 17 19:04:47 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Thu, 17 Oct 2019 18:04:47 +0000 Subject: [gpfsug-discuss] mmbackup questions In-Reply-To: <20191017142603.r7dfwrexfnqilsu7@utumno.gs.washington.edu> References: <960c7d8ff66dacb199b36590f96e5faa815e5b06.camel@strath.ac.uk> <20191017142603.r7dfwrexfnqilsu7@utumno.gs.washington.edu> Message-ID: <9fb0e5a0-3eee-fdf1-526c-498f42d89aea@strath.ac.uk> On 17/10/2019 15:26, Skylar Thompson wrote: > On Thu, Oct 17, 2019 at 10:26:45AM +0000, Jonathan Buzzard wrote: >> I have been looking to give mmbackup another go (a very long history >> with it being a pile of steaming dinosaur droppings last time I tried, >> but that was seven years ago). >> >> Anyway having done a backup last night I am curious about something >> that does not appear to be explained in the documentation. >> >> Basically the output has a line like the following >> >> Total number of objects inspected: 474630 >> >> What is this number? Is it the number of files that have changed since >> the last backup or something else as it is not the number of files on >> the file system by any stretch of the imagination. One would hope that >> it inspected everything on the file system... > > I believe this is the number of paths that matched some include rule (or > didn't match some exclude rule) for mmbackup. I would assume it would > differ from the "total number of objects backed up" line if there were > include/exclude rules that mmbackup couldn't process, leaving it to dsmc to > decide whether to process. > After digging through dsminstr.log it would appear to be the sum of the combination of new, changed and deleted files that mmbackup is going to process. There is some wierd sh*t going on though with mmbackup on the face of it, where it sends one file to the TSM server. A line with the total number of files in the file system (aka potential backup candidates) would be nice I think. >> Also it appears that the shadow database is held on the GPFS file system >> that is being backed up. Is there any way to change the location of that? >> I am only using one node for backup (because I am cheap and don't like >> paying for more PVU's than I need to) and would like to hold it on the >> node doing the backup where I can put it on SSD. Which does to things >> firstly hopefully goes a lot faster, and secondly reduces the impact on >> the file system of the backup. > > I haven't tried it, but there is a MMBACKUP_RECORD_ROOT environment > variable noted in the mmbackup man path: > > Specifies an alternative directory name for > storing all temporary and permanent records for > the backup. The directory name specified must > be an existing directory and it cannot contain > special characters (for example, a colon, > semicolon, blank, tab, or comma). > > Which seems like it might provide a mechanism to store the shadow database > elsewhere. For us, though, we provide storage via a cost center, so we > would want our customers to eat the full cost of their excessive file counts. > We have set a file quota of one million for all our users. So far only one users has actually needed it raising. It does however make users come and have a conversation with us about what they are doing. With the one exception they have found ways to do their work without abusing the file system as a database. We don't have a SSD storage pool on the file system so moving it to the backup node for which we can add SSD cheaply (I mean really really cheap these days) is more realistic that adding some SSD for a storage pool to the file system. Once I am a bit more familiar with it I will try changing it to the system disks. It's not SSD at the moment but if it works I can easily justify getting some and replacing the existing drives (it would just be two RAID rebuilds away). Last time it was brought up you could not add extra shelves to an existing DSS-G system, you had to buy a whole new one. This is despite the servers shipping with a full complement of SAS cards and a large box full of 12Gbps SAS cables (well over ?1000 worth at list I reckon) that are completely useless. Ok they work and I could use them elsewhere but frankly why ship them if I can't expand!!! >> Anyway a significant speed up (assuming it worked) was achieved but I >> note even the ancient Xeon E3113 (dual core 3GHz) was never taxed (load >> average never went above one) and we didn't touch the swap despite only >> have 24GB of RAM. Though the 10GbE networking did get busy during the >> transfer of data to the TSM server bit of the backup but during the >> "assembly stage" it was all a bit quiet, and the DSS-G server nodes where >> not busy either. What options are there for tuning things because I feel >> it should be able to go a lot faster. > > We have some TSM nodes (corresponding to GPFS filesets) that stress out our > mmbackup cluster at the sort step of mmbackup. UNIX sort is not > RAM-friendly, as it happens. > I have configured more monitoring of the system, and will watch it over the coming days, but nothing was stressed on our system at all as far as I can tell but it was going slower than I had hoped. It was still way faster than a traditional dsmc incr but I was hoping for more though I am not sure why as the backup now takes place well inside my backup window. Perhaps I am being greedy. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From S.J.Thompson at bham.ac.uk Thu Oct 17 19:37:28 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 17 Oct 2019 18:37:28 +0000 Subject: [gpfsug-discuss] mmbackup questions In-Reply-To: <9fb0e5a0-3eee-fdf1-526c-498f42d89aea@strath.ac.uk> References: <960c7d8ff66dacb199b36590f96e5faa815e5b06.camel@strath.ac.uk> <20191017142603.r7dfwrexfnqilsu7@utumno.gs.washington.edu> <9fb0e5a0-3eee-fdf1-526c-498f42d89aea@strath.ac.uk> Message-ID: Mmbackup uses tsbuhelper internally. This is effectively a diff of the previous and current policy scan. Objects inspected is the count of these files that are changed since the last time and these are the candidates sent to the TSM server. You mention not being able to upgrade a DSS-G, I thought this has been available for sometime as a special bid process. We did something very complicated with ours at one point. I also thought the "no-upgrade" was related to a support position from IBM on creating additional DAs. You can't add new storage to an DA, but believe it's possible and now supported (I think) to add expansion shelves into a new DA. (I think ESS also supports this). Note that you don't necessarily get the same performance of doing this as if you'd purchased a fully stacked system in the first place. For example if you initially had 166 drives as a two expansion system and then add 84 drives in a new expansion, you now have two DAs, one smaller than the other and neither the same as if you'd originally created it with 250 drives... I don't actually have any benchmarks to prove this, but it was my understanding from various discussions over time. There are also now both DSS (and ESS) configs with both spinning and SSD enclosures. I assume these aren't special bid only products anymore. Simon ?On 17/10/2019, 19:05, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" wrote: On 17/10/2019 15:26, Skylar Thompson wrote: > On Thu, Oct 17, 2019 at 10:26:45AM +0000, Jonathan Buzzard wrote: >> I have been looking to give mmbackup another go (a very long history >> with it being a pile of steaming dinosaur droppings last time I tried, >> but that was seven years ago). >> >> Anyway having done a backup last night I am curious about something >> that does not appear to be explained in the documentation. >> >> Basically the output has a line like the following >> >> Total number of objects inspected: 474630 >> >> What is this number? Is it the number of files that have changed since >> the last backup or something else as it is not the number of files on >> the file system by any stretch of the imagination. One would hope that >> it inspected everything on the file system... > > I believe this is the number of paths that matched some include rule (or > didn't match some exclude rule) for mmbackup. I would assume it would > differ from the "total number of objects backed up" line if there were > include/exclude rules that mmbackup couldn't process, leaving it to dsmc to > decide whether to process. > After digging through dsminstr.log it would appear to be the sum of the combination of new, changed and deleted files that mmbackup is going to process. There is some wierd sh*t going on though with mmbackup on the face of it, where it sends one file to the TSM server. A line with the total number of files in the file system (aka potential backup candidates) would be nice I think. >> Also it appears that the shadow database is held on the GPFS file system >> that is being backed up. Is there any way to change the location of that? >> I am only using one node for backup (because I am cheap and don't like >> paying for more PVU's than I need to) and would like to hold it on the >> node doing the backup where I can put it on SSD. Which does to things >> firstly hopefully goes a lot faster, and secondly reduces the impact on >> the file system of the backup. > > I haven't tried it, but there is a MMBACKUP_RECORD_ROOT environment > variable noted in the mmbackup man path: > > Specifies an alternative directory name for > storing all temporary and permanent records for > the backup. The directory name specified must > be an existing directory and it cannot contain > special characters (for example, a colon, > semicolon, blank, tab, or comma). > > Which seems like it might provide a mechanism to store the shadow database > elsewhere. For us, though, we provide storage via a cost center, so we > would want our customers to eat the full cost of their excessive file counts. > We have set a file quota of one million for all our users. So far only one users has actually needed it raising. It does however make users come and have a conversation with us about what they are doing. With the one exception they have found ways to do their work without abusing the file system as a database. We don't have a SSD storage pool on the file system so moving it to the backup node for which we can add SSD cheaply (I mean really really cheap these days) is more realistic that adding some SSD for a storage pool to the file system. Once I am a bit more familiar with it I will try changing it to the system disks. It's not SSD at the moment but if it works I can easily justify getting some and replacing the existing drives (it would just be two RAID rebuilds away). Last time it was brought up you could not add extra shelves to an existing DSS-G system, you had to buy a whole new one. This is despite the servers shipping with a full complement of SAS cards and a large box full of 12Gbps SAS cables (well over ?1000 worth at list I reckon) that are completely useless. Ok they work and I could use them elsewhere but frankly why ship them if I can't expand!!! >> Anyway a significant speed up (assuming it worked) was achieved but I >> note even the ancient Xeon E3113 (dual core 3GHz) was never taxed (load >> average never went above one) and we didn't touch the swap despite only >> have 24GB of RAM. Though the 10GbE networking did get busy during the >> transfer of data to the TSM server bit of the backup but during the >> "assembly stage" it was all a bit quiet, and the DSS-G server nodes where >> not busy either. What options are there for tuning things because I feel >> it should be able to go a lot faster. > > We have some TSM nodes (corresponding to GPFS filesets) that stress out our > mmbackup cluster at the sort step of mmbackup. UNIX sort is not > RAM-friendly, as it happens. > I have configured more monitoring of the system, and will watch it over the coming days, but nothing was stressed on our system at all as far as I can tell but it was going slower than I had hoped. It was still way faster than a traditional dsmc incr but I was hoping for more though I am not sure why as the backup now takes place well inside my backup window. Perhaps I am being greedy. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From novosirj at rutgers.edu Fri Oct 18 02:18:04 2019 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Fri, 18 Oct 2019 01:18:04 +0000 Subject: [gpfsug-discuss] waiters and files causing waiters In-Reply-To: References: Message-ID: <9E891DBC-2785-4A49-9E4D-D6D2C11B8740@rutgers.edu> Found my notes on this; very similar to what Behrooz was saying. This here is from ?mmfsadm dump waiters,selected_files?; as you can see here, we?re looking at thread 29168. Apparently below, ?inodeFlushHolder? corresponds to that same thread in the case I was looking at. You could then look up the inode with ?tsfindinode -i ?, so like for the below, "tsfindinode -i 41538053 /gpfs/cache? on our system. ===== dump waiters ==== Current time 2019-05-01_13:48:26-0400 Waiting 0.1669 sec since 13:48:25, monitored, thread 29168 FileBlockWriteFetchHandlerThread: on ThCond 0x7F55E40014C8 (MsgRecordCondvar), reason 'RPC wait' for quotaMsgRequestShare on node 192.168.33.7 ===== dump selected_files ===== Current time 2019-05-01_13:48:36-0400 ... OpenFile: 4E044E5B0601A8C0:000000000279D205:0000000000000000 @ 0x1806AC5EAC8 cach 1 ref 1 hc 2 tc 6 mtx 0x1806AC5EAF8 Inode: valid eff token xw @ 0x1806AC5EC70, ctMode xw seq 170823 lock state [ wf: 1 ] x [] flags [ ] Mnode: valid eff token xw @ 0x1806AC5ECC0, ctMode xw seq 170823 DMAPI: invalid eff token nl @ 0x1806AC5EC20, ctMode nl seq 170821 SMBOpen: valid eff token (A:RMA D: ) @ 0x1806AC5EB50, ctMode (A:RMA D: ) seq 170823 lock state [ M(2) D: ] x [] flags [ ] SMBOpLk: valid eff token wf @ 0x1806AC5EBC0, ctMode wf Flags 0x30 (pfro+pfxw) seq 170822 BR: @ 0x1806AC5ED20, ctMode nl Flags 0x10 (pfro) seq 170823 treeP 0x18016189C08 C btFastTrack 0 1 ranges mode RO/XW: BLK [0,INF] mode XW node <403> Fcntl: @ 0x1806AC5ED48, ctMode nl Flags 0x30 (pfro+pfxw) seq 170823 treeP 0x18031A5E3F8 C btFastTrack 0 1 ranges mode RO/XW: BLK [0,INF] mode XW node <403> inode 41538053 snap 0 USERFILE nlink 1 genNum 0x3CC2743F mode 0200100600: -rw------- tmmgr node (other) metanode (me) fail+panic count -1 flags 0x0, remoteStart 0 remoteCnt 0 localCnt 177 lastFrom 65535 switchCnt 0 locks held in mode xw: 0x1806AC5F238: 0x0-0xFFF tid 15954 gbl 0 mode xw rel 0 BRL nXLocksOrRelinquishes 285 vfsReference 1 dioCount 0 dioFlushNeeded 1 dioSkipCounter 0 dioReentryThreshold 0.000000 hasWriterInstance 1 inodeFlushFlag 1 inodeFlushHolder 29168 openInstCount 1 metadataFlushCount 2, metadataFlushWaiters 0/0, metadataCommitVersion 1 bufferListCount 1 bufferListChangeCount 3 dirty status: flushed dirtiedSyncNum 1477623 SMB oplock state: nWriters 1 indBlockDeallocLock: sharedLockWord 1 exclLockWord 0 upgradeWaitingS_W 0 upgradeWaitingW_X 0 inodeValid 1 objectVersion 240 flushVersion 8086700 mnodeChangeCount 1 block size code 5 (32 subblocksPerFileBlock) dataBytesPerFileBlock 4194304 fileSize 0 synchedFileSize 0 indirectionLevel 1 atime 1556732911.496160000 mtime 1556732911.496479000 ctime 1556732911.496479000 crtime 1556732911.496160000 owner uid 169589 gid 169589 > On Oct 10, 2019, at 4:43 PM, Damir Krstic wrote: > > is it possible via some set of mmdiag --waiters or mmfsadm dump ? to figure out which files or directories access (whether it's read or write) is causing long-er waiters? > > in all my looking i have not been able to get that information out of various diagnostic commands. > > thanks, > damir > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathan.buzzard at strath.ac.uk Fri Oct 18 08:58:40 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 18 Oct 2019 07:58:40 +0000 Subject: [gpfsug-discuss] mmbackup questions In-Reply-To: References: <960c7d8ff66dacb199b36590f96e5faa815e5b06.camel@strath.ac.uk> <20191017142603.r7dfwrexfnqilsu7@utumno.gs.washington.edu> <9fb0e5a0-3eee-fdf1-526c-498f42d89aea@strath.ac.uk> Message-ID: On 17/10/2019 19:37, Simon Thompson wrote: > Mmbackup uses tsbuhelper internally. This is effectively a diff of > the previous and current policy scan. Objects inspected is the count > of these files that are changed since the last time and these are the > candidates sent to the TSM server. > > You mention not being able to upgrade a DSS-G, I thought this has > been available for sometime as a special bid process. We did > something very complicated with ours at one point. I also thought the > "no-upgrade" was related to a support position from IBM on creating > additional DAs. You can't add new storage to an DA, but believe it's > possible and now supported (I think) to add expansion shelves into a > new DA. (I think ESS also supports this). Note that you don't > necessarily get the same performance of doing this as if you'd > purchased a fully stacked system in the first place. For example if > you initially had 166 drives as a two expansion system and then add > 84 drives in a new expansion, you now have two DAs, one smaller than > the other and neither the same as if you'd originally created it with > 250 drives... I don't actually have any benchmarks to prove this, but > it was my understanding from various discussions over time. > Well it was only the beginning of this year that we asked for a quote for expanding our DSS-G as part of a wider storage upgrade that was to be put to the IT funding committee at the university. I was expecting just to need some more shelves, only to told we need to start again. Like I said if that was the case why ship with all those extra unneeded and unusable SAS cards and SAS cables. At the very least it is not environmentally friendly. Then again the spec that came back had a 2x10Gb LOM, despite the DSS-G documentation being very explicit about needing a 4x1Gb LOM, which is still the case in the 2.4b documentation as of last month. I do note odd numbers of shelves other than one is now supported. That said the tools in at least 2.1 incorrectly states having one shelf is unsupported!!! Presumably they the person writing the tool only tested for even numbers not realizing one while odd was supported. You can also mix shelf types now, but again if I wanted to add some SSD it's a new DSS-G not a couple of D1224 shelves. That also nukes the DA argument for no upgrades I think because you would not be wanting to mix the two in that way. > There are also now both DSS (and ESS) configs with both spinning and > SSD enclosures. I assume these aren't special bid only products > anymore. I don't think so, along with odd numbers of shelves they are in general Lenovo literature. They also have a node with NVMe up the front (or more accurately up the back in PCIe slots), the DSS-G100. My take on the DSS-G is that it is a cost effective way to deploy GPFS storage. However there are loads of seemingly arbitrary quirks and limitations, a bit sh*t crazy upgrade procedure and questionable hardware maintenance. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From scale at us.ibm.com Fri Oct 18 09:34:01 2019 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Fri, 18 Oct 2019 16:34:01 +0800 Subject: [gpfsug-discuss] waiters and files causing waiters In-Reply-To: <9E891DBC-2785-4A49-9E4D-D6D2C11B8740@rutgers.edu> References: <9E891DBC-2785-4A49-9E4D-D6D2C11B8740@rutgers.edu> Message-ID: Right for the example from Ryan(and according to the thread name, you know that it is writing to a file or directory), but for other cases, it may take more steps to figure out what access to which file is causing the long waiters(i.e., when mmap is being used on some nodes, or token revoke pending from some node, and etc.). Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Ryan Novosielski To: gpfsug main discussion list Date: 2019/10/18 09:18 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] waiters and files causing waiters Sent by: gpfsug-discuss-bounces at spectrumscale.org Found my notes on this; very similar to what Behrooz was saying. This here is from ?mmfsadm dump waiters,selected_files?; as you can see here, we?re looking at thread 29168. Apparently below, ?inodeFlushHolder? corresponds to that same thread in the case I was looking at. You could then look up the inode with ?tsfindinode -i ?, so like for the below, "tsfindinode -i 41538053 /gpfs/cache? on our system. ===== dump waiters ==== Current time 2019-05-01_13:48:26-0400 Waiting 0.1669 sec since 13:48:25, monitored, thread 29168 FileBlockWriteFetchHandlerThread: on ThCond 0x7F55E40014C8 (MsgRecordCondvar), reason 'RPC wait' for quotaMsgRequestShare on node 192.168.33.7 ===== dump selected_files ===== Current time 2019-05-01_13:48:36-0400 ... OpenFile: 4E044E5B0601A8C0:000000000279D205:0000000000000000 @ 0x1806AC5EAC8 cach 1 ref 1 hc 2 tc 6 mtx 0x1806AC5EAF8 Inode: valid eff token xw @ 0x1806AC5EC70, ctMode xw seq 170823 lock state [ wf: 1 ] x [] flags [ ] Mnode: valid eff token xw @ 0x1806AC5ECC0, ctMode xw seq 170823 DMAPI: invalid eff token nl @ 0x1806AC5EC20, ctMode nl seq 170821 SMBOpen: valid eff token (A:RMA D: ) @ 0x1806AC5EB50, ctMode (A:RMA D: ) seq 170823 lock state [ M(2) D: ] x [] flags [ ] SMBOpLk: valid eff token wf @ 0x1806AC5EBC0, ctMode wf Flags 0x30 (pfro+pfxw) seq 170822 BR: @ 0x1806AC5ED20, ctMode nl Flags 0x10 (pfro) seq 170823 treeP 0x18016189C08 C btFastTrack 0 1 ranges mode RO/XW: BLK [0,INF] mode XW node <403> Fcntl: @ 0x1806AC5ED48, ctMode nl Flags 0x30 (pfro+pfxw) seq 170823 treeP 0x18031A5E3F8 C btFastTrack 0 1 ranges mode RO/XW: BLK [0,INF] mode XW node <403> inode 41538053 snap 0 USERFILE nlink 1 genNum 0x3CC2743F mode 0200100600: -rw------- tmmgr node (other) metanode (me) fail+panic count -1 flags 0x0, remoteStart 0 remoteCnt 0 localCnt 177 lastFrom 65535 switchCnt 0 locks held in mode xw: 0x1806AC5F238: 0x0-0xFFF tid 15954 gbl 0 mode xw rel 0 BRL nXLocksOrRelinquishes 285 vfsReference 1 dioCount 0 dioFlushNeeded 1 dioSkipCounter 0 dioReentryThreshold 0.000000 hasWriterInstance 1 inodeFlushFlag 1 inodeFlushHolder 29168 openInstCount 1 metadataFlushCount 2, metadataFlushWaiters 0/0, metadataCommitVersion 1 bufferListCount 1 bufferListChangeCount 3 dirty status: flushed dirtiedSyncNum 1477623 SMB oplock state: nWriters 1 indBlockDeallocLock: sharedLockWord 1 exclLockWord 0 upgradeWaitingS_W 0 upgradeWaitingW_X 0 inodeValid 1 objectVersion 240 flushVersion 8086700 mnodeChangeCount 1 block size code 5 (32 subblocksPerFileBlock) dataBytesPerFileBlock 4194304 fileSize 0 synchedFileSize 0 indirectionLevel 1 atime 1556732911.496160000 mtime 1556732911.496479000 ctime 1556732911.496479000 crtime 1556732911.496160000 owner uid 169589 gid 169589 > On Oct 10, 2019, at 4:43 PM, Damir Krstic wrote: > > is it possible via some set of mmdiag --waiters or mmfsadm dump ? to figure out which files or directories access (whether it's read or write) is causing long-er waiters? > > in all my looking i have not been able to get that information out of various diagnostic commands. > > thanks, > damir > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=VdmIfneKWlidYoO2I90hBuZJ2VxXu8L8oq86E7zyh8Q&s=dkQrCzMmxeh6tu0UpPgSIphmRwcBiSpL7QbZPw5RNtI&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=VdmIfneKWlidYoO2I90hBuZJ2VxXu8L8oq86E7zyh8Q&s=dkQrCzMmxeh6tu0UpPgSIphmRwcBiSpL7QbZPw5RNtI&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From jon at well.ox.ac.uk Tue Oct 22 10:12:31 2019 From: jon at well.ox.ac.uk (Jon Diprose) Date: Tue, 22 Oct 2019 09:12:31 +0000 Subject: [gpfsug-discuss] AMD Rome support? Message-ID: Dear GPFSUG, I see the faq says Spectrum Scale is supported on "AMD Opteron based servers". Does anyone know if/when support will be officially extended to cover AMD Epyc, especially the new 7002 (Rome) series? Does anyone have any experience of running Spectrum Scale on Rome they could share, in particular for protocol nodes and for plain clients? Thanks, Jon -- Dr. Jonathan Diprose Tel: 01865 287837 Research Computing Manager Henry Wellcome Building for Genomic Medicine Roosevelt Drive, Headington, Oxford OX3 7BN From knop at us.ibm.com Tue Oct 22 17:30:38 2019 From: knop at us.ibm.com (Felipe Knop) Date: Tue, 22 Oct 2019 12:30:38 -0400 Subject: [gpfsug-discuss] AMD Rome support? In-Reply-To: References: Message-ID: Jon, AMD processors which are completely compatible with Opteron should also work. Please also refer to Q5.3 on the SMP scaling limit: 64 cores: https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html Regards, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Jon Diprose To: gpfsug main discussion list Date: 10/22/2019 05:13 AM Subject: [EXTERNAL] [gpfsug-discuss] AMD Rome support? Sent by: gpfsug-discuss-bounces at spectrumscale.org Dear GPFSUG, I see the faq says Spectrum Scale is supported on "AMD Opteron based servers". Does anyone know if/when support will be officially extended to cover AMD Epyc, especially the new 7002 (Rome) series? Does anyone have any experience of running Spectrum Scale on Rome they could share, in particular for protocol nodes and for plain clients? Thanks, Jon -- Dr. Jonathan Diprose Tel: 01865 287837 Research Computing Manager Henry Wellcome Building for Genomic Medicine Roosevelt Drive, Headington, Oxford OX3 7BN _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=eizQJGD_5DpnaQUqNkIE3V9qJciVjfLCgo4ZHixZ5Ns&s=JomlTDVPlwFCvLtVOmGd4J6FrfbUK6cMVlLe5Ut638U&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From jonathan.buzzard at strath.ac.uk Tue Oct 22 19:40:36 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 22 Oct 2019 18:40:36 +0000 Subject: [gpfsug-discuss] AMD Rome support? In-Reply-To: References: Message-ID: <1c594dbd-4f5c-45aa-57aa-6b610d5c0e86@strath.ac.uk> On 22/10/2019 17:30, Felipe Knop wrote: > Jon, > > AMD processors which are completely compatible with Opteron should also > work. > > Please also refer to Q5.3 on the SMP scaling limit: 64 cores: > > https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html > Hum, is that per CPU or the total for a machine? The reason I ask is we have some large memory nodes (3TB of RAM) and these are quad Xeon 6138 CPU's giving a total of 80 cores in the machine... We have not seen any problems, but if it is 64 cores per machine IBM needs to do some scaling testing ASAP to raise the limit as 64 cores per machine in 2019 is ridiculously low. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From Stephan.Peinkofer at lrz.de Wed Oct 23 06:00:44 2019 From: Stephan.Peinkofer at lrz.de (Peinkofer, Stephan) Date: Wed, 23 Oct 2019 05:00:44 +0000 Subject: [gpfsug-discuss] AMD Rome support? In-Reply-To: References: Message-ID: <0E081EFD-538E-4E00-A625-54B99F57D960@lrz.de> Dear Jon, we run a bunch of AMD EPYC Naples Dual Socket servers with GPFS in our TSM Server Cluster. From what I can say it runs stable, but IO performance in general and GPFS performance in particular - even compared to an Xeon E5 v3 system - is rather poor. So to put that into perspective on the Xeon Systems with two EDR IB Links, we get 20GB/s read and write performance to GPFS using iozone very easily. On the AMD systems - with all AMD EPYC tuning suggestions applied you can find in the internet - we get around 15GB/s write but only 6GB/s read. We also opened a ticket at IBM for this but never found out anything. Probably because not many are running GPFS on AMD EPYC right now? The answer from AMD basically was that the bad IO performance is expected in Dual Socket systems because the Socket Interconnect is the bottleneck. (See also the IB tests DELL did https://www.dell.com/support/article/de/de/debsdt1/sln313856/amd-epyc-stream-hpl-infiniband-and-wrf-performance-study?lang=en as soon as you have to cross the socket border you get only half of the IB performance) Of course with ROME everything get?s better (that?s what AMD told us through our vendor) but if you have the chance then I would recommend to benchmark AMD vs. XEON with your particular IO workloads before buying. Best Regards, Stephan Peinkofer -- Stephan Peinkofer Dipl. Inf. (FH), M. Sc. (TUM) Leibniz Supercomputing Centre Data and Storage Division Boltzmannstra?e 1, 85748 Garching b. M?nchen URL: http://www.lrz.de On 22. Oct 2019, at 11:12, Jon Diprose > wrote: Dear GPFSUG, I see the faq says Spectrum Scale is supported on "AMD Opteron based servers". Does anyone know if/when support will be officially extended to cover AMD Epyc, especially the new 7002 (Rome) series? Does anyone have any experience of running Spectrum Scale on Rome they could share, in particular for protocol nodes and for plain clients? Thanks, Jon -- Dr. Jonathan Diprose > Tel: 01865 287837 Research Computing Manager Henry Wellcome Building for Genomic Medicine Roosevelt Drive, Headington, Oxford OX3 7BN _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Ivano.Talamo at psi.ch Wed Oct 23 10:49:02 2019 From: Ivano.Talamo at psi.ch (Talamo Ivano Giuseppe (PSI)) Date: Wed, 23 Oct 2019 09:49:02 +0000 Subject: [gpfsug-discuss] Filesystem access issues via CES NFS In-Reply-To: References: <08c2be8b-ece1-a7a4-606f-e63fe854a8b1@psi.ch> <717f49aade0b439eb1b99fc620a21cac@maxiv.lu.se> <9456645b0a1f4b488b13874ea672b9b8@maxiv.lu.se> Message-ID: Dear all, We are actually in the process of upgrading our CES cluster to 5.0.3-3 but we have doubts about how to proceed. Considering that the CES cluster is in production and heavily used, our plan is to add a new node with 5.0.3-3 to the cluster that is currently 5.0.2.1. And we would like to proceed in a cautious way, so that the new node would not take any IP and just one day per week (when we will declare to be ?at risk?) we would move some IPs to it. After some weeks of tests if we would see no problem we would upgrade the rest of the cluster. But reading these doc [1] it seems that we cannot have multiple GPFS/SMB version in the same cluster. So in that case we could not have a testing/acceptance phase but could only make the full blind jump. Can someone confirm or negate this? Thanks, Ivano [1] https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/com.ibm.spectrum.scale.v5r02.doc/bl1ins_updatingsmb.htm On 04.10.19, 12:55, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Malahal R Naineni" wrote: You can use 5.0.3.3 . There is no fix for the sssd issue yet though. I will work with Ganesha upstream community pretty soon. Regards, Malahal. ----- Original message ----- From: Leonardo Sala To: gpfsug main discussion list , "Malahal R Naineni" , Cc: Subject: [EXTERNAL] Re: [gpfsug-discuss] Filesystem access issues via CES NFS Date: Fri, Oct 4, 2019 12:02 PM Dear Malahal, thanks for the answer. Concerning SSSD, we are also using it, should we use 5.0.2-PTF3? We would like to avoid using 5.0.2.2, as it has issues with recent RHEL 7.6 kernels [*] and we are impacted: do you suggest to use 5.0.3.3? cheers leo [*] https://www.ibm.com/support/pages/ibm-spectrum-scale-gpfs-releases-42313-or-later-and-5022-or-later-have-issues-where-kernel-crashes-rhel76-0 Paul Scherrer Institut Dr. Leonardo Sala Group Leader High Performance Computing Deputy Section Head Science IT Science IT WHGA/106 5232 Villigen PSI Switzerland Phone: +41 56 310 3369 leonardo.sala at psi.ch www.psi.ch On 03.10.19 19:15, Malahal R Naineni wrote: >> @Malahal: Looks like you have written the netgroup caching code, feel free to ask for further details if required. Hi Ulrich, Ganesha uses innetgr() call for netgroup information and sssd has too many issues in its implementation. Redhat said that they are going to fix sssd synchronization issues in RHEL8. It is in my plate to serialize innergr() call in Ganesha to match kernel NFS server usage! I expect the sssd issue to give EACCESS/EPERM kind of issue but not EINVAL though. If you are using sssd, you must be getting into a sssd issue. Ganesha has a host-ip cache fix in 5.0.2 PTF3. Please make sure you use ganesha version V2.5.3-ibm030.01 if you are using netgroups (shipped with 5.0.2 PTF3 but can be used with Scale 5.0.1 or later) Regards, Malahal. ----- Original message ----- From: Ulrich Sibiller Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Cc: Subject: Re: [gpfsug-discuss] Filesystem access issues via CES NFS Date: Thu, Dec 13, 2018 7:32 PM On 23.11.2018 14:41, Andreas Mattsson wrote: > Yes, this is repeating. > > We?ve ascertained that it has nothing to do at all with file operations on the GPFS side. > > Randomly throughout the filesystem mounted via NFS, ls or file access will give > > ? > > > ls: reading directory /gpfs/filessystem/test/testdir: Invalid argument > > ? > > Trying again later might work on that folder, but might fail somewhere else. > > We have tried exporting the same filesystem via a standard kernel NFS instead of the CES > Ganesha-NFS, and then the problem doesn?t exist. > > So it is definitely related to the Ganesha NFS server, or its interaction with the file system. > > Will see if I can get a tcpdump of the issue. We see this, too. We cannot trigger it. Fortunately I have managed to capture some logs with debugging enabled. I have now dug into the ganesha 2.5.3 code and I think the netgroup caching is the culprit. Here some FULL_DEBUG output: 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :Check for address 1.2.3.4 for export id 1 path /gpfsexport 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] client_match :EXPORT :M_DBG :Match V4: 0xcf7fe0 NETGROUP_CLIENT: netgroup1 (options=421021e2root_squash , RWrw, 3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] client_match :EXPORT :M_DBG :Match V4: 0xcfe320 NETGROUP_CLIENT: netgroup2 (options=421021e2root_squash , RWrw, 3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] client_match :EXPORT :M_DBG :Match V4: 0xcfe380 NETGROUP_CLIENT: netgroup3 (options=421021e2root_squash , RWrw, 3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :EXPORT (options=03303002 , , , , , -- Deleg, , ) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :EXPORT_DEFAULTS (options=42102002root_squash , ----, 3--, ---, TCP, ----, Manage_Gids , , anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :default options (options=03303002root_squash , ----, 34-, UDP, TCP, ----, No Manage_Gids, -- Deleg, anon_uid= -2, anon_gid= -2, none, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :Final options (options=42102002root_squash , ----, 3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_rpc_execute :DISP :INFO :DISP: INFO: Client ::ffff:1.2.3.4 is not allowed to access Export_Id 1 /gpfsexport, vers=3, proc=18 The client "client1" is definitely a member of the "netgroup1". But the NETGROUP_CLIENT lookups for "netgroup2" and "netgroup3" can only happen if the netgroup caching code reports that "client1" is NOT a member of "netgroup1". I have also opened a support case at IBM for this. @Malahal: Looks like you have written the netgroup caching code, feel free to ask for further details if required. Kind regards, Ulrich Sibiller -- Dipl.-Inf. Ulrich Sibiller science + computing ag System Administration Hagellocher Weg 73 72070 Tuebingen, Germany https://atos.net/de/deutschland/sc -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From luis.bolinches at fi.ibm.com Wed Oct 23 10:56:57 2019 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Wed, 23 Oct 2019 09:56:57 +0000 Subject: [gpfsug-discuss] Filesystem access issues via CES NFS In-Reply-To: References: , <08c2be8b-ece1-a7a4-606f-e63fe854a8b1@psi.ch><717f49aade0b439eb1b99fc620a21cac@maxiv.lu.se><9456645b0a1f4b488b13874ea672b9b8@maxiv.lu.se> Message-ID: An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Oct 23 11:14:23 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Wed, 23 Oct 2019 10:14:23 +0000 Subject: [gpfsug-discuss] Filesystem access issues via CES NFS In-Reply-To: References: <08c2be8b-ece1-a7a4-606f-e63fe854a8b1@psi.ch> <717f49aade0b439eb1b99fc620a21cac@maxiv.lu.se> <9456645b0a1f4b488b13874ea672b9b8@maxiv.lu.se> Message-ID: <69D9E45C-1B31-4C5F-97C1-37F9C8ECC6EF@bham.ac.uk> From our experience, you can generally upgrade the GPFS code node by node, but the SMB code has to be identical on all nodes. So that's basically a do it one day and cross your fingers it doesn't break moment... but it is disruptive as well as you have to stop SMB to do the upgrade. I think there is a long standing RFE open on this about non disruptive SMB upgrades... Simon ?On 23/10/2019, 10:49, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Ivano.Talamo at psi.ch" wrote: Dear all, We are actually in the process of upgrading our CES cluster to 5.0.3-3 but we have doubts about how to proceed. Considering that the CES cluster is in production and heavily used, our plan is to add a new node with 5.0.3-3 to the cluster that is currently 5.0.2.1. And we would like to proceed in a cautious way, so that the new node would not take any IP and just one day per week (when we will declare to be ?at risk?) we would move some IPs to it. After some weeks of tests if we would see no problem we would upgrade the rest of the cluster. But reading these doc [1] it seems that we cannot have multiple GPFS/SMB version in the same cluster. So in that case we could not have a testing/acceptance phase but could only make the full blind jump. Can someone confirm or negate this? Thanks, Ivano [1] https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/com.ibm.spectrum.scale.v5r02.doc/bl1ins_updatingsmb.htm On 04.10.19, 12:55, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Malahal R Naineni" wrote: You can use 5.0.3.3 . There is no fix for the sssd issue yet though. I will work with Ganesha upstream community pretty soon. Regards, Malahal. ----- Original message ----- From: Leonardo Sala To: gpfsug main discussion list , "Malahal R Naineni" , Cc: Subject: [EXTERNAL] Re: [gpfsug-discuss] Filesystem access issues via CES NFS Date: Fri, Oct 4, 2019 12:02 PM Dear Malahal, thanks for the answer. Concerning SSSD, we are also using it, should we use 5.0.2-PTF3? We would like to avoid using 5.0.2.2, as it has issues with recent RHEL 7.6 kernels [*] and we are impacted: do you suggest to use 5.0.3.3? cheers leo [*] https://www.ibm.com/support/pages/ibm-spectrum-scale-gpfs-releases-42313-or-later-and-5022-or-later-have-issues-where-kernel-crashes-rhel76-0 Paul Scherrer Institut Dr. Leonardo Sala Group Leader High Performance Computing Deputy Section Head Science IT Science IT WHGA/106 5232 Villigen PSI Switzerland Phone: +41 56 310 3369 leonardo.sala at psi.ch www.psi.ch On 03.10.19 19:15, Malahal R Naineni wrote: >> @Malahal: Looks like you have written the netgroup caching code, feel free to ask for further details if required. Hi Ulrich, Ganesha uses innetgr() call for netgroup information and sssd has too many issues in its implementation. Redhat said that they are going to fix sssd synchronization issues in RHEL8. It is in my plate to serialize innergr() call in Ganesha to match kernel NFS server usage! I expect the sssd issue to give EACCESS/EPERM kind of issue but not EINVAL though. If you are using sssd, you must be getting into a sssd issue. Ganesha has a host-ip cache fix in 5.0.2 PTF3. Please make sure you use ganesha version V2.5.3-ibm030.01 if you are using netgroups (shipped with 5.0.2 PTF3 but can be used with Scale 5.0.1 or later) Regards, Malahal. ----- Original message ----- From: Ulrich Sibiller Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Cc: Subject: Re: [gpfsug-discuss] Filesystem access issues via CES NFS Date: Thu, Dec 13, 2018 7:32 PM On 23.11.2018 14:41, Andreas Mattsson wrote: > Yes, this is repeating. > > We?ve ascertained that it has nothing to do at all with file operations on the GPFS side. > > Randomly throughout the filesystem mounted via NFS, ls or file access will give > > ? > > > ls: reading directory /gpfs/filessystem/test/testdir: Invalid argument > > ? > > Trying again later might work on that folder, but might fail somewhere else. > > We have tried exporting the same filesystem via a standard kernel NFS instead of the CES > Ganesha-NFS, and then the problem doesn?t exist. > > So it is definitely related to the Ganesha NFS server, or its interaction with the file system. > > Will see if I can get a tcpdump of the issue. We see this, too. We cannot trigger it. Fortunately I have managed to capture some logs with debugging enabled. I have now dug into the ganesha 2.5.3 code and I think the netgroup caching is the culprit. Here some FULL_DEBUG output: 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :Check for address 1.2.3.4 for export id 1 path /gpfsexport 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] client_match :EXPORT :M_DBG :Match V4: 0xcf7fe0 NETGROUP_CLIENT: netgroup1 (options=421021e2root_squash , RWrw, 3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] client_match :EXPORT :M_DBG :Match V4: 0xcfe320 NETGROUP_CLIENT: netgroup2 (options=421021e2root_squash , RWrw, 3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] client_match :EXPORT :M_DBG :Match V4: 0xcfe380 NETGROUP_CLIENT: netgroup3 (options=421021e2root_squash , RWrw, 3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :EXPORT (options=03303002 , , , , , -- Deleg, , ) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :EXPORT_DEFAULTS (options=42102002root_squash , ----, 3--, ---, TCP, ----, Manage_Gids , , anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :default options (options=03303002root_squash , ----, 34-, UDP, TCP, ----, No Manage_Gids, -- Deleg, anon_uid= -2, anon_gid= -2, none, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :Final options (options=42102002root_squash , ----, 3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_rpc_execute :DISP :INFO :DISP: INFO: Client ::ffff:1.2.3.4 is not allowed to access Export_Id 1 /gpfsexport, vers=3, proc=18 The client "client1" is definitely a member of the "netgroup1". But the NETGROUP_CLIENT lookups for "netgroup2" and "netgroup3" can only happen if the netgroup caching code reports that "client1" is NOT a member of "netgroup1". I have also opened a support case at IBM for this. @Malahal: Looks like you have written the netgroup caching code, feel free to ask for further details if required. Kind regards, Ulrich Sibiller -- Dipl.-Inf. Ulrich Sibiller science + computing ag System Administration Hagellocher Weg 73 72070 Tuebingen, Germany https://atos.net/de/deutschland/sc -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathan.buzzard at strath.ac.uk Wed Oct 23 12:20:18 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Wed, 23 Oct 2019 11:20:18 +0000 Subject: [gpfsug-discuss] Filesystem access issues via CES NFS In-Reply-To: <69D9E45C-1B31-4C5F-97C1-37F9C8ECC6EF@bham.ac.uk> References: <08c2be8b-ece1-a7a4-606f-e63fe854a8b1@psi.ch> <717f49aade0b439eb1b99fc620a21cac@maxiv.lu.se> <9456645b0a1f4b488b13874ea672b9b8@maxiv.lu.se> <69D9E45C-1B31-4C5F-97C1-37F9C8ECC6EF@bham.ac.uk> Message-ID: On Wed, 2019-10-23 at 10:14 +0000, Simon Thompson wrote: > From our experience, you can generally upgrade the GPFS code node by > node, but the SMB code has to be identical on all nodes. So that's > basically a do it one day and cross your fingers it doesn't break > moment... but it is disruptive as well as you have to stop SMB to do > the upgrade. I think there is a long standing RFE open on this about > non disruptive SMB upgrades... > My understanding is that the issue is the ctdb database suffers from basically being a "memory dump", so a change in the code can effect the database so all the nodes have to be the same. It's the same issue that historically plagued Microsoft Office file formats. Though of course you might get lucky and it just works. I have in the past in the days of role your own because there was no such thing as IBM provided Samba for GPFS done exactly that on several occasions. There was not warnings not to at the time... If you want to do testing before deployment a test cluster is the way forward. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From Ivano.Talamo at psi.ch Wed Oct 23 12:23:22 2019 From: Ivano.Talamo at psi.ch (Talamo Ivano Giuseppe (PSI)) Date: Wed, 23 Oct 2019 11:23:22 +0000 Subject: [gpfsug-discuss] Filesystem access issues via CES NFS In-Reply-To: References: <08c2be8b-ece1-a7a4-606f-e63fe854a8b1@psi.ch> <717f49aade0b439eb1b99fc620a21cac@maxiv.lu.se> <9456645b0a1f4b488b13874ea672b9b8@maxiv.lu.se> <69D9E45C-1B31-4C5F-97C1-37F9C8ECC6EF@bham.ac.uk> Message-ID: Yes, thanks for the feedback. We already have a test cluster, so I guess we will go that way, just making sure to stay as close as possible to the production one. Cheers, Ivano On 23.10.19, 13:20, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" wrote: On Wed, 2019-10-23 at 10:14 +0000, Simon Thompson wrote: > From our experience, you can generally upgrade the GPFS code node by > node, but the SMB code has to be identical on all nodes. So that's > basically a do it one day and cross your fingers it doesn't break > moment... but it is disruptive as well as you have to stop SMB to do > the upgrade. I think there is a long standing RFE open on this about > non disruptive SMB upgrades... > My understanding is that the issue is the ctdb database suffers from basically being a "memory dump", so a change in the code can effect the database so all the nodes have to be the same. It's the same issue that historically plagued Microsoft Office file formats. Though of course you might get lucky and it just works. I have in the past in the days of role your own because there was no such thing as IBM provided Samba for GPFS done exactly that on several occasions. There was not warnings not to at the time... If you want to do testing before deployment a test cluster is the way forward. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From A.Wolf-Reber at de.ibm.com Wed Oct 23 14:05:24 2019 From: A.Wolf-Reber at de.ibm.com (Alexander Wolf) Date: Wed, 23 Oct 2019 13:05:24 +0000 Subject: [gpfsug-discuss] Filesystem access issues via CES NFS In-Reply-To: References: , <08c2be8b-ece1-a7a4-606f-e63fe854a8b1@psi.ch><717f49aade0b439eb1b99fc620a21cac@maxiv.lu.se><9456645b0a1f4b488b13874ea672b9b8@maxiv.lu.se><69D9E45C-1B31-4C5F-97C1-37F9C8ECC6EF@bham.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15718124397183.png Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15718124397184.png Type: image/png Size: 6645 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15718124397185.png Type: image/png Size: 1134 bytes Desc: not available URL: From david_johnson at brown.edu Wed Oct 23 16:19:24 2019 From: david_johnson at brown.edu (David Johnson) Date: Wed, 23 Oct 2019 11:19:24 -0400 Subject: [gpfsug-discuss] question about spectrum scale 5.0.3 installer Message-ID: <71C5E053-263A-4889-99EC-63F9A8D5E806@brown.edu> I built a test cluster a month ago on 14 nodes. Today I want to install two more NSD nodes. When I tried to run the installer, it looks like it is going back and fiddling with the nodes that were installed earlier, and are up and running with the filesystem mounted. I ended up having to abort the install (rebooted the two new nodes because they were stuck on multpath that had had earlier errors), and the messages indicated that the installation failed on all the existing NSD and GUI nodes, but no mention of the two that I wanted to install on. Do I have anything to worry about when I try again (now that multipath is fixed)? I want to be able to incrementally add servers and clients as we go along, and not have the installer messing up previous progress. Can I tell the installer exactly which nodes to work on? Thanks, ? ddj Dave Johnson Brown University From david_johnson at brown.edu Wed Oct 23 16:33:01 2019 From: david_johnson at brown.edu (David Johnson) Date: Wed, 23 Oct 2019 11:33:01 -0400 Subject: [gpfsug-discuss] question about spectrum scale 5.0.3 installer In-Reply-To: <71C5E053-263A-4889-99EC-63F9A8D5E806@brown.edu> References: <71C5E053-263A-4889-99EC-63F9A8D5E806@brown.edu> Message-ID: <54DAC656-CEFE-4AF2-BB4F-9A595DD067C4@brown.edu> By the way, we have been dealing with adding and deleting nodes manually since GPFS 3.4, back in 2009. At what point is the spectrumscale command line utility more trouble than it?s worth? > On Oct 23, 2019, at 11:19 AM, David Johnson wrote: > > I built a test cluster a month ago on 14 nodes. Today I want to install two more NSD nodes. > When I tried to run the installer, it looks like it is going back and fiddling with the nodes that > were installed earlier, and are up and running with the filesystem mounted. > > I ended up having to abort the install (rebooted the two new nodes because they were stuck > on multpath that had had earlier errors), and the messages indicated that the installation failed > on all the existing NSD and GUI nodes, but no mention of the two that I wanted to install on. > > Do I have anything to worry about when I try again (now that multipath is fixed)? I want to be > able to incrementally add servers and clients as we go along, and not have the installer > messing up previous progress. Can I tell the installer exactly which nodes to work on? > > Thanks, > ? ddj > Dave Johnson > Brown University From Robert.Oesterlin at nuance.com Thu Oct 24 15:03:25 2019 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 24 Oct 2019 14:03:25 +0000 Subject: [gpfsug-discuss] ESS - Considerations when adding NSD space? Message-ID: <4C700BA6-90D1-40B7-BBDA-48645E74D7F7@nuance.com> We recently upgraded our GL4 to a GL6 (trouble free process for those considering FYI). I now have 615T free (raw) in each of my recovery groups. I?d like to increase the size of one of the file systems (currently at 660T, I?d like to add 100T). My first thought was going to be: mmvdisk vdiskset define --vdisk-set fsdata1 --recovery-group rg_gssio1-hs,rg_gssio2-hs --set-size 50T --code 8+2p --block-size 4m --nsd-usage dataOnly --storage-pool data mmvdisk vdiskset create --vdisk-set fs1data1 mmvdisk filesystem add --filesystem fs1 --vdisk-set fs1data1 I know in the past use of mixed size NSDs was frowned upon, not sure on the ESS. The other approach would be add two larger NSDs (current ones are 330T) of 380T, migrate the data to the new ones using mmrestripe, then delete the old ones. The other benefit of this process would be to have the file system data better balanced across all the storage enclosures. Any considerations before I do this? Thoughts? Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Thu Oct 24 16:54:50 2019 From: stockf at us.ibm.com (Frederick Stock) Date: Thu, 24 Oct 2019 15:54:50 +0000 Subject: [gpfsug-discuss] ESS - Considerations when adding NSD space? In-Reply-To: <4C700BA6-90D1-40B7-BBDA-48645E74D7F7@nuance.com> References: <4C700BA6-90D1-40B7-BBDA-48645E74D7F7@nuance.com> Message-ID: An HTML attachment was scrubbed... URL: From A.Wolf-Reber at de.ibm.com Thu Oct 24 20:43:13 2019 From: A.Wolf-Reber at de.ibm.com (Alexander Wolf) Date: Thu, 24 Oct 2019 19:43:13 +0000 Subject: [gpfsug-discuss] ESS - Considerations when adding NSD space? In-Reply-To: References: , <4C700BA6-90D1-40B7-BBDA-48645E74D7F7@nuance.com> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15719308156166.png Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15719308156167.png Type: image/png Size: 6645 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15719308156168.png Type: image/png Size: 1134 bytes Desc: not available URL: From lgayne at us.ibm.com Fri Oct 25 18:54:02 2019 From: lgayne at us.ibm.com (Lyle Gayne) Date: Fri, 25 Oct 2019 17:54:02 +0000 Subject: [gpfsug-discuss] ESS - Considerations when adding NSD space? In-Reply-To: References: , , <4C700BA6-90D1-40B7-BBDA-48645E74D7F7@nuance.com> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15719308156166.png Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15719308156167.png Type: image/png Size: 6645 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15719308156168.png Type: image/png Size: 1134 bytes Desc: not available URL: From olaf.weiser at de.ibm.com Fri Oct 25 18:59:48 2019 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Fri, 25 Oct 2019 19:59:48 +0200 Subject: [gpfsug-discuss] ESS - Considerations when adding NSD space? In-Reply-To: References: , <4C700BA6-90D1-40B7-BBDA-48645E74D7F7@nuance.com> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 6645 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1134 bytes Desc: not available URL: From Robert.Oesterlin at nuance.com Mon Oct 28 14:02:57 2019 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 28 Oct 2019 14:02:57 +0000 Subject: [gpfsug-discuss] Question on CES Authentication - LDAP Message-ID: <15B1F438-38DD-4F2C-89CA-5C4EE8929CFA@nuance.com> This relates to V 5.0.3. If my CES server node has system defined authentication using LDAP, should I expect that setting my authentication setting of ?userdefined? using mmuserauth to work? That doesn?t seem to be the case for me. Is there some other setting I should be using? I tried using LDAP in mmuserauth, and that promptly stomped on my sssd.conf file on that node which broke everything. Any by the way, stores a plain text password in the sssd.conf file just for good measure! Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Mon Oct 28 17:12:08 2019 From: valdis.kletnieks at vt.edu (Valdis Kl=?utf-8?Q?=c4=93?=tnieks) Date: Mon, 28 Oct 2019 13:12:08 -0400 Subject: [gpfsug-discuss] Question on CES Authentication - LDAP In-Reply-To: <15B1F438-38DD-4F2C-89CA-5C4EE8929CFA@nuance.com> References: <15B1F438-38DD-4F2C-89CA-5C4EE8929CFA@nuance.com> Message-ID: <55677.1572282728@turing-police> On Mon, 28 Oct 2019 14:02:57 -0000, "Oesterlin, Robert" said: > Any by the way, stores a plain text password in the sssd.conf file just for > good measure! Note that if you want the system to come up without intervention, at best you can only store an obfuscated password, not a securely encrypted one. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From jonathan.buzzard at strath.ac.uk Tue Oct 29 10:14:57 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 29 Oct 2019 10:14:57 +0000 Subject: [gpfsug-discuss] Question on CES Authentication - LDAP In-Reply-To: <55677.1572282728@turing-police> References: <15B1F438-38DD-4F2C-89CA-5C4EE8929CFA@nuance.com> <55677.1572282728@turing-police> Message-ID: <1d324529a566cdd262a8874e48938002f9c1b4d0.camel@strath.ac.uk> On Mon, 2019-10-28 at 13:12 -0400, Valdis Kl?tnieks wrote: > On Mon, 28 Oct 2019 14:02:57 -0000, "Oesterlin, Robert" said: > > Any by the way, stores a plain text password in the sssd.conf file > > just for good measure! > > Note that if you want the system to come up without intervention, at > best you can only store an obfuscated password, not a securely > encrypted one. > Kerberos and a machine account spring to mind. Crazy given Kerberos is a Unix technology everyone seems to forget about it. Also my understanding is that in theory a TPM module in your server can be used for this https://en.wikipedia.org/wiki/Trusted_Platform_Module Support in Linux is weak at best, but basically it can be used to store passwords and it can be tied to the system. Locality and physical presence being the terminology used. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From linesr at janelia.hhmi.org Thu Oct 31 15:23:59 2019 From: linesr at janelia.hhmi.org (Lines, Robert) Date: Thu, 31 Oct 2019 15:23:59 +0000 Subject: [gpfsug-discuss] Inherited ACLs and multi-protocol access Message-ID: I know I am missing something here and it is probably due to lack of experience dealing with ACLs as all other storage we distil down to just posix UGO permissions. We have Windows native clients creating data. There are SMB clients of various flavors accessing data via CES. Then there are Linux native clients that interface between gpfs and other NFS filers for data movement. What I am running into is around inheriting permissions so that windows native and smb clients have access based on the users group membership that remains sane while also being able to migrate files off to nfs filers with reasonable posix permissions. Here is the top level directory that is the lab name and there is a matching group. That directory is the highest point where an ACL has been set with inheritance. The directory listed is one created from a Windows Native client. The issue I am running into is that that largec7 directory that was created is having the posix permissions set to nothing for the owner. The ACL that results is okay but when that folder or anything in it is synced off to another filer that only has the basic posix permission it acts kinda wonky. The user was able to fix up his files on the other filer because he was still the owner but I would like to make it work properly. [root at gpfs-dm1 smith]# ls -la drwxrwsr-x 84 root smith 16384 Oct 30 23:22 . d---rwsr-x 2 tim smith 4096 Oct 30 23:22 largec7 drwx--S--- 2 tim smith 4096 Oct 24 00:17 CFA1 [root at gpfs-dm1 smith]# mmgetacl . #NFSv4 ACL #owner:root #group:smith special:owner@:rwxc:allow:FileInherit:DirInherit (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED special:group@:rwxc:allow:FileInherit:DirInherit (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED special:everyone@:r-x-:allow:FileInherit:DirInherit (X)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED [root at gpfs-dm1 smith]# mmgetacl largec7 #NFSv4 ACL #owner:tim #group:smith #ACL flags: # DACL_PRESENT # DACL_AUTO_INHERITED # SACL_AUTO_INHERITED user:root:rwxc:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED special:group@:rwxc:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED special:everyone@:r-x-:allow:FileInherit:DirInherit:Inherited (X)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED In contrast the CFA1 directory was created prior to the file and directory inheritance being put in place. That worked okay as long as it was only that user but the lack of group access is a problem and what led to trying to sort out the inherited ACLs in the first place. [root at gpfs-dm1 smith]# ls -l drwx--S--- 2 tim smith 4096 Oct 24 00:17 CFA1 [root at gpfs-dm1 smith]# mmgetacl CFA1 #NFSv4 ACL #owner:tim #group:smith #ACL flags: # DACL_PRESENT # DACL_AUTO_INHERITED # SACL_AUTO_INHERITED special:owner@:rwxc:allow (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED user:15000001:rwxc:allow (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (X)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED user:15000306:r-x-:allow (X)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED Thank you for any suggestions. -- Rob Lines Sr. HPC Engineer HHMI Janelia Research Campus -------------- next part -------------- An HTML attachment was scrubbed... URL: