From andreas.mattsson at maxiv.lu.se Fri Jan 4 09:09:03 2019 From: andreas.mattsson at maxiv.lu.se (Andreas Mattsson) Date: Fri, 4 Jan 2019 09:09:03 +0000 Subject: [gpfsug-discuss] Filesystem access issues via CES NFS In-Reply-To: References: <717f49aade0b439eb1b99fc620a21cac@maxiv.lu.se> <9456645b0a1f4b488b13874ea672b9b8@maxiv.lu.se>, Message-ID: Just reporting back that the issue we had seems to have been solved. In our case it was fixed by applying hotfix-packages from IBM. Did this in December and I can no longer trigger the issue. Hopefully, it'll stay fixed when we get full production load on the system again now in January. Also, as far as I can see, it looks like Scale 5.0.2.2 includes these packages already. Regards, Andreas mattsson ____________________________________________ [X] Andreas Mattsson Systems Engineer MAX IV Laboratory Lund University P.O. Box 118, SE-221 00 Lund, Sweden Visiting address: Fotongatan 2, 224 84 Lund Mobile: +46 706 64 95 44 andreas.mattsson at maxiv.lu.se www.maxiv.se ________________________________ Fr?n: gpfsug-discuss-bounces at spectrumscale.org f?r Ulrich Sibiller Skickat: den 13 december 2018 14:52:42 Till: gpfsug-discuss at spectrumscale.org ?mne: Re: [gpfsug-discuss] Filesystem access issues via CES NFS On 23.11.2018 14:41, Andreas Mattsson wrote: > Yes, this is repeating. > > We?ve ascertained that it has nothing to do at all with file operations on the GPFS side. > > Randomly throughout the filesystem mounted via NFS, ls or file access will give > > ? > > > ls: reading directory /gpfs/filessystem/test/testdir: Invalid argument > > ? > > Trying again later might work on that folder, but might fail somewhere else. > > We have tried exporting the same filesystem via a standard kernel NFS instead of the CES > Ganesha-NFS, and then the problem doesn?t exist. > > So it is definitely related to the Ganesha NFS server, or its interaction with the file system. > > Will see if I can get a tcpdump of the issue. We see this, too. We cannot trigger it. Fortunately I have managed to capture some logs with debugging enabled. I have now dug into the ganesha 2.5.3 code and I think the netgroup caching is the culprit. Here some FULL_DEBUG output: 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :Check for address 1.2.3.4 for export id 1 path /gpfsexport 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] client_match :EXPORT :M_DBG :Match V4: 0xcf7fe0 NETGROUP_CLIENT: netgroup1 (options=421021e2root_squash , RWrw, 3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] client_match :EXPORT :M_DBG :Match V4: 0xcfe320 NETGROUP_CLIENT: netgroup2 (options=421021e2root_squash , RWrw, 3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] client_match :EXPORT :M_DBG :Match V4: 0xcfe380 NETGROUP_CLIENT: netgroup3 (options=421021e2root_squash , RWrw, 3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :EXPORT (options=03303002 , , , , , -- Deleg, , ) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :EXPORT_DEFAULTS (options=42102002root_squash , ----, 3--, ---, TCP, ----, Manage_Gids , , anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :default options (options=03303002root_squash , ----, 34-, UDP, TCP, ----, No Manage_Gids, -- Deleg, anon_uid= -2, anon_gid= -2, none, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :Final options (options=42102002root_squash , ----, 3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_rpc_execute :DISP :INFO :DISP: INFO: Client ::ffff:1.2.3.4 is not allowed to access Export_Id 1 /gpfsexport, vers=3, proc=18 The client "client1" is definitely a member of the "netgroup1". But the NETGROUP_CLIENT lookups for "netgroup2" and "netgroup3" can only happen if the netgroup caching code reports that "client1" is NOT a member of "netgroup1". I have also opened a support case at IBM for this. @Malahal: Looks like you have written the netgroup caching code, feel free to ask for further details if required. Kind regards, Ulrich Sibiller -- Dipl.-Inf. Ulrich Sibiller science + computing ag System Administration Hagellocher Weg 73 72070 Tuebingen, Germany https://atos.net/de/deutschland/sc -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From roblogie at au1.ibm.com Tue Jan 8 21:49:51 2019 From: roblogie at au1.ibm.com (Rob Logie) Date: Tue, 8 Jan 2019 21:49:51 +0000 Subject: [gpfsug-discuss] User Login Active Directory authentication on CES nodes with SMB protocol Message-ID: Hi All Is there a way to enable User Login Active Directory authentication on CES nodes with SMB protocol that are joined to an AD domain. ? The AD authentication is working for access to the SMB shares, but not for user login authentication on the CES nodes. Thanks ! Regards, Rob Logie IT Specialist -------------- next part -------------- An HTML attachment was scrubbed... URL: From lgayne at us.ibm.com Tue Jan 8 21:53:51 2019 From: lgayne at us.ibm.com (Lyle Gayne) Date: Tue, 8 Jan 2019 16:53:51 -0500 Subject: [gpfsug-discuss] User Login Active Directory authentication on CES nodes with SMB protocol In-Reply-To: References: Message-ID: Adding Ingo Meents for response From: "Rob Logie" To: gpfsug-discuss at spectrumscale.org Date: 01/08/2019 04:50 PM Subject: [gpfsug-discuss] User Login Active Directory authentication on CES nodes with SMB protocol Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All Is there a way to enable User Login Active Directory authentication on CES nodes with SMB protocol that are joined to an AD domain. ? The AD authentication is working for access to the SMB shares, but not for user login authentication on the CES nodes. Thanks ! Regards, Rob Logie IT Specialist _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From arc at b4restore.com Wed Jan 9 10:25:13 2019 From: arc at b4restore.com (Andi Rhod Christiansen) Date: Wed, 9 Jan 2019 10:25:13 +0000 Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation. Message-ID: Hi, I seem to be unable to find any information on separating protocol services on specific CES nodes within a cluster. Does anyone know if it is possible to take, lets say 4 of the ces nodes within a cluster and dividing them into two and have two of the running SMB and the other two running OBJ instead of having them all run both services? If it is possible it would be great to hear pros and cons about doing this ? Thanks in advance! Venlig hilsen / Best Regards Andi Christiansen IT Solution Specialist -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Wed Jan 9 11:16:49 2019 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Wed, 9 Jan 2019 11:16:49 +0000 Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation. In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Jan 9 12:19:30 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Wed, 9 Jan 2019 12:19:30 +0000 Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation. Message-ID: <5ABB423F-71AF-4469-9FDA-589EA8872B86@bham.ac.uk> You have to run all services on all nodes ( ? ) actually its technically possible to remove the packages once protocols is running on the node, but next time you reboot the node, it will get marked unhealthy and you spend an hour working out why? But what we do to split load is have different IPs assigned to different CES groups and then assign the SMB nodes to the SMB group IPs etc ? Technically a user could still connect to the NFS (in our case) IPs with SMB protocol, but there?s not a lot we can do about that ? though our upstream firewall drops said traffic. Simon From: on behalf of "arc at b4restore.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Wednesday, 9 January 2019 at 10:31 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation. Hi, I seem to be unable to find any information on separating protocol services on specific CES nodes within a cluster. Does anyone know if it is possible to take, lets say 4 of the ces nodes within a cluster and dividing them into two and have two of the running SMB and the other two running OBJ instead of having them all run both services? If it is possible it would be great to hear pros and cons about doing this ? Thanks in advance! Venlig hilsen / Best Regards Andi Christiansen IT Solution Specialist -------------- next part -------------- An HTML attachment was scrubbed... URL: From arc at b4restore.com Wed Jan 9 13:23:17 2019 From: arc at b4restore.com (Andi Rhod Christiansen) Date: Wed, 9 Jan 2019 13:23:17 +0000 Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation. In-Reply-To: References: Message-ID: <1886db2cdf074bf0aaa151c395d300d5@B4RWEX01.internal.b4restore.com> Hi Andrew, Where can I request such a feature? ? Venlig hilsen / Best Regards Andi Rhod Christiansen Fra: gpfsug-discuss-bounces at spectrumscale.org P? vegne af Andrew Beattie Sendt: 9. januar 2019 12:17 Til: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Emne: Re: [gpfsug-discuss] Spectrum Scale protocol node service separation. Andi, All the CES nodes in the same cluster will share the same protocol exports if you want to separate them you need to create remote mount clusters and export the additional protocols via the remote mount it would actually be a useful RFE to have the ablity to create CES groups attached to the base cluster and by group create exports of different protocols, but its not available today. Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: Andi Rhod Christiansen > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation. Date: Wed, Jan 9, 2019 8:31 PM Hi, I seem to be unable to find any information on separating protocol services on specific CES nodes within a cluster. Does anyone know if it is possible to take, lets say 4 of the ces nodes within a cluster and dividing them into two and have two of the running SMB and the other two running OBJ instead of having them all run both services? If it is possible it would be great to hear pros and cons about doing this ? Thanks in advance! Venlig hilsen / Best Regards Andi Christiansen IT Solution Specialist _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From arc at b4restore.com Wed Jan 9 13:24:30 2019 From: arc at b4restore.com (Andi Rhod Christiansen) Date: Wed, 9 Jan 2019 13:24:30 +0000 Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation. In-Reply-To: <5ABB423F-71AF-4469-9FDA-589EA8872B86@bham.ac.uk> References: <5ABB423F-71AF-4469-9FDA-589EA8872B86@bham.ac.uk> Message-ID: Hi Simon, It was actually also the only solution I found if I want to keep them within the same cluster ? Thanks for the reply, I will see what we figure out ! Venlig hilsen / Best Regards Andi Rhod Christiansen Fra: gpfsug-discuss-bounces at spectrumscale.org P? vegne af Simon Thompson Sendt: 9. januar 2019 13:20 Til: gpfsug main discussion list Emne: Re: [gpfsug-discuss] Spectrum Scale protocol node service separation. You have to run all services on all nodes ( ? ) actually its technically possible to remove the packages once protocols is running on the node, but next time you reboot the node, it will get marked unhealthy and you spend an hour working out why? But what we do to split load is have different IPs assigned to different CES groups and then assign the SMB nodes to the SMB group IPs etc ? Technically a user could still connect to the NFS (in our case) IPs with SMB protocol, but there?s not a lot we can do about that ? though our upstream firewall drops said traffic. Simon From: > on behalf of "arc at b4restore.com" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Wednesday, 9 January 2019 at 10:31 To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation. Hi, I seem to be unable to find any information on separating protocol services on specific CES nodes within a cluster. Does anyone know if it is possible to take, lets say 4 of the ces nodes within a cluster and dividing them into two and have two of the running SMB and the other two running OBJ instead of having them all run both services? If it is possible it would be great to hear pros and cons about doing this ? Thanks in advance! Venlig hilsen / Best Regards Andi Christiansen IT Solution Specialist -------------- next part -------------- An HTML attachment was scrubbed... URL: From Paul.Sanchez at deshaw.com Wed Jan 9 14:05:48 2019 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Wed, 9 Jan 2019 14:05:48 +0000 Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation. In-Reply-To: References: Message-ID: <53ec54bb621242109a789e51d61b1377@mbxtoa1.winmail.deshaw.com> The docs say: ?CES supports the following export protocols: NFS, SMB, object, and iSCSI (block). Each protocol can be enabled or disabled in the cluster. If a protocol is enabled in the CES cluster, all CES nodes serve that protocol.? Which would seem to indicate that the answer is ?no?. This kind of thing is another good reason to license Scale by storage capacity rather than by sockets (PVU). This approach was already a good idea due to the flexibility it allows to scale manager, quorum, and NSD server nodes for performance and high-availability without affecting your software licensing costs. This can result in better design and the flexibility to more quickly respond to new problems by adding server nodes. So assuming you?re not on the old PVU licensing model, it is trivial to deploy as many gateway nodes as needed to separate these into distinct remote clusters. You can create an object gateway cluster, and a CES gateway cluster each which only mounts and exports what is necessary. You can even virtualize these servers and host them on the same hardware, if you?re into that. -Paul From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Andi Rhod Christiansen Sent: Wednesday, January 9, 2019 5:25 AM To: gpfsug main discussion list Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation. Hi, I seem to be unable to find any information on separating protocol services on specific CES nodes within a cluster. Does anyone know if it is possible to take, lets say 4 of the ces nodes within a cluster and dividing them into two and have two of the running SMB and the other two running OBJ instead of having them all run both services? If it is possible it would be great to hear pros and cons about doing this ? Thanks in advance! Venlig hilsen / Best Regards Andi Christiansen IT Solution Specialist -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Jan 9 16:35:37 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Wed, 9 Jan 2019 16:35:37 +0000 Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation. In-Reply-To: <53ec54bb621242109a789e51d61b1377@mbxtoa1.winmail.deshaw.com> References: , <53ec54bb621242109a789e51d61b1377@mbxtoa1.winmail.deshaw.com> Message-ID: I think only recently was remote cluster support added (though we have been doing it since CES was released). I agree that capacity licenses have freed us to implement a better solution.. no longer do we run quorum/token managers on nsd nodes to reduce socket costs. I believe socket based licenses are also about to or already no longer available for new customers (existing customers can continue to buy). Carl can probably comment on this? Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Paul.Sanchez at deshaw.com [Paul.Sanchez at deshaw.com] Sent: 09 January 2019 14:05 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Spectrum Scale protocol node service separation. The docs say: ?CES supports the following export protocols: NFS, SMB, object, and iSCSI (block). Each protocol can be enabled or disabled in the cluster. If a protocol is enabled in the CES cluster, all CES nodes serve that protocol.? Which would seem to indicate that the answer is ?no?. This kind of thing is another good reason to license Scale by storage capacity rather than by sockets (PVU). This approach was already a good idea due to the flexibility it allows to scale manager, quorum, and NSD server nodes for performance and high-availability without affecting your software licensing costs. This can result in better design and the flexibility to more quickly respond to new problems by adding server nodes. So assuming you?re not on the old PVU licensing model, it is trivial to deploy as many gateway nodes as needed to separate these into distinct remote clusters. You can create an object gateway cluster, and a CES gateway cluster each which only mounts and exports what is necessary. You can even virtualize these servers and host them on the same hardware, if you?re into that. -Paul From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Andi Rhod Christiansen Sent: Wednesday, January 9, 2019 5:25 AM To: gpfsug main discussion list Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation. Hi, I seem to be unable to find any information on separating protocol services on specific CES nodes within a cluster. Does anyone know if it is possible to take, lets say 4 of the ces nodes within a cluster and dividing them into two and have two of the running SMB and the other two running OBJ instead of having them all run both services? If it is possible it would be great to hear pros and cons about doing this ? Thanks in advance! Venlig hilsen / Best Regards Andi Christiansen IT Solution Specialist From aspalazz at us.ibm.com Wed Jan 9 17:21:03 2019 From: aspalazz at us.ibm.com (Aaron S Palazzolo) Date: Wed, 9 Jan 2019 17:21:03 +0000 Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation In-Reply-To: Message-ID: An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Jan 9 18:04:47 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Wed, 9 Jan 2019 18:04:47 +0000 Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation In-Reply-To: References: , Message-ID: Can you use node affinity within CES groups? For example I have some shiny new servers I want to normally use. If I plan maintenance, I move the IP to another shiny box. But I also have some old off support legacy hardware that I'm happy to use in a DR situation (e.g. they are in another site). So I want a group for my SMB boxes and NFS boxes, but have affinity normally, and then have old hardware in case of failure. Whilst we're on protocols, are there any restrictions on using mixed architectures? I don't recall seeing this but... E.g. my new shiny boxes are ppc64le systems and my old legacy nodes are x86. It's all ctdb locking right .. (ok maybe mixing be and le hosts would be bad) (Sure I'll take a performance hit when I fail to the old nodes, but that is better than no service). Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of aspalazz at us.ibm.com [aspalazz at us.ibm.com] Sent: 09 January 2019 17:21 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Spectrum Scale protocol node service separation Hey guys - I wanted to reply from the Scale development side..... First off, consider CES as a stack and the implications of such: - all protocols are installed on all nodes - if a specific protocol is enabled (SMB, NFS, OBJ, Block), it's enabled for all protocol nodes - if a specific protocol is started (SMB, NFS, OBJ, Block), it's started on all nodes by default, unless manually specified. As was indicated in the e-mail chain, you don't want to be removing rpms to create a subset of nodes serving various protocols as this will cause overall issues. You also don't want to manually be disabling protocols on some nodes/not others in order to achieve nodes that are 'only serving' SMB, for instance. Doing this manual stopping/starting of protocols isn't something that will adhere to failover. =============================================================== A few possible solutions if you want to segregate protocols to specific nodes are: =============================================================== 1) CES-Groups in combination with specific IPs / DNS hostnames that correspond to each protocol. - As mentioned, this can still be bypassed if someone attempts a mount using an IP/DNS name not set for their protocol. However, you could probably prevent some of this with an external firewall rule. - Using CES-Groups confines the IPs/DNS hostnames to very specific nodes 2) Firewall rules - This is best if done external to the cluster, and at a level that can restrict specific protocol traffic to specific IPs/hostnames - combine this with #1 for the best results. - Although it may work, try to stay away from crazy firewall rules on each protocol node itself as this can get confusing very quickly. It's easier if you can set this up external to the nodes. 3) Similar to above but using Node Affinity CES-IP policy - but no CES groups. - Upside is node-affinity will attempt to keep your CES-IPs associated with specific nodes. So if you restrict specific protocol traffic to specific IPs, then they'll stay on nodes you designate - Watch out for failovers. In error cases (or upgrades) where an IP needs to move to another node, it obviously can't remain on the node that's having issues. This means you may have protocol trafffic crossover when this occurs. 4) A separate remote cluster for each CES protocol - In this example, you could make fairly small remote clusters (although we recommend 2->3nodes at least for failover purposes). The local cluster would provide the storage. The remote clusters would mount it. One remote cluster could have only SMB enabled. Another remote cluster could have only OBJ enabled. etc... ------ I hope this helps a bit.... Regards, Aaron Palazzolo IBM Spectrum Scale Deployment, Infrastructure, Virtualization 9042 S Rita Road, Tucson AZ 85744 Phone: 520-799-5161, T/L: 321-5161 E-mail: aspalazz at us.ibm.com ----- Original message ----- From: gpfsug-discuss-request at spectrumscale.org Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Cc: Subject: gpfsug-discuss Digest, Vol 84, Issue 4 Date: Wed, Jan 9, 2019 7:13 AM Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: Spectrum Scale protocol node service separation. (Andi Rhod Christiansen) 2. Re: Spectrum Scale protocol node service separation. (Sanchez, Paul) ---------------------------------------------------------------------- Message: 1 Date: Wed, 9 Jan 2019 13:24:30 +0000 From: Andi Rhod Christiansen To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Spectrum Scale protocol node service separation. Message-ID: Content-Type: text/plain; charset="utf-8" Hi Simon, It was actually also the only solution I found if I want to keep them within the same cluster ? Thanks for the reply, I will see what we figure out ! Venlig hilsen / Best Regards Andi Rhod Christiansen Fra: gpfsug-discuss-bounces at spectrumscale.org P? vegne af Simon Thompson Sendt: 9. januar 2019 13:20 Til: gpfsug main discussion list Emne: Re: [gpfsug-discuss] Spectrum Scale protocol node service separation. You have to run all services on all nodes ( ? ) actually its technically possible to remove the packages once protocols is running on the node, but next time you reboot the node, it will get marked unhealthy and you spend an hour working out why? But what we do to split load is have different IPs assigned to different CES groups and then assign the SMB nodes to the SMB group IPs etc ? Technically a user could still connect to the NFS (in our case) IPs with SMB protocol, but there?s not a lot we can do about that ? though our upstream firewall drops said traffic. Simon From: > on behalf of "arc at b4restore.com" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Wednesday, 9 January 2019 at 10:31 To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation. Hi, I seem to be unable to find any information on separating protocol services on specific CES nodes within a cluster. Does anyone know if it is possible to take, lets say 4 of the ces nodes within a cluster and dividing them into two and have two of the running SMB and the other two running OBJ instead of having them all run both services? If it is possible it would be great to hear pros and cons about doing this ? Thanks in advance! Venlig hilsen / Best Regards Andi Christiansen IT Solution Specialist -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Wed, 9 Jan 2019 14:05:48 +0000 From: "Sanchez, Paul" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Spectrum Scale protocol node service separation. Message-ID: <53ec54bb621242109a789e51d61b1377 at mbxtoa1.winmail.deshaw.com> Content-Type: text/plain; charset="utf-8" The docs say: ?CES supports the following export protocols: NFS, SMB, object, and iSCSI (block). Each protocol can be enabled or disabled in the cluster. If a protocol is enabled in the CES cluster, all CES nodes serve that protocol.? Which would seem to indicate that the answer is ?no?. This kind of thing is another good reason to license Scale by storage capacity rather than by sockets (PVU). This approach was already a good idea due to the flexibility it allows to scale manager, quorum, and NSD server nodes for performance and high-availability without affecting your software licensing costs. This can result in better design and the flexibility to more quickly respond to new problems by adding server nodes. So assuming you?re not on the old PVU licensing model, it is trivial to deploy as many gateway nodes as needed to separate these into distinct remote clusters. You can create an object gateway cluster, and a CES gateway cluster each which only mounts and exports what is necessary. You can even virtualize these servers and host them on the same hardware, if you?re into that. -Paul From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Andi Rhod Christiansen Sent: Wednesday, January 9, 2019 5:25 AM To: gpfsug main discussion list Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation. Hi, I seem to be unable to find any information on separating protocol services on specific CES nodes within a cluster. Does anyone know if it is possible to take, lets say 4 of the ces nodes within a cluster and dividing them into two and have two of the running SMB and the other two running OBJ instead of having them all run both services? If it is possible it would be great to hear pros and cons about doing this ? Thanks in advance! Venlig hilsen / Best Regards Andi Christiansen IT Solution Specialist -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 84, Issue 4 ********************************************* From christof.schmitt at us.ibm.com Wed Jan 9 18:10:13 2019 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Wed, 9 Jan 2019 18:10:13 +0000 Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation In-Reply-To: References: , , Message-ID: An HTML attachment was scrubbed... URL: From christof.schmitt at us.ibm.com Wed Jan 9 19:03:25 2019 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Wed, 9 Jan 2019 19:03:25 +0000 Subject: [gpfsug-discuss] User Login Active Directory authentication on CES nodes with SMB protocol In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.1__=8FBB09EFDFEBBB408f9e8a93df938690918c8FB at .gif Type: image/gif Size: 105 bytes Desc: not available URL: From carlz at us.ibm.com Wed Jan 9 19:19:20 2019 From: carlz at us.ibm.com (Carl Zetie) Date: Wed, 9 Jan 2019 19:19:20 +0000 Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation Message-ID: ST>I believe socket based licenses are also about to or already no longer available ST>for new customers (existing customers can continue to buy). ST>Carl can probably comment on this? That is correct. Friday Jan 11 is the last chance for *new* customers to buy Standard Edition sockets. And as Simon says, those of you who are currently Sockets customers can remain on Sockets, buying additional licenses and renewing existing licenses. (IBM Legal requires me to add, any statement about the future is an intention, not a commitment -- but, as I've said before, as long as it's my decision to make, my intent is to keep Sockets as long as existing customers want them). And yes, one of the reasons I wanted to get away from Socket pricing is the kind of scenarios some of you brought up. Implementing the best deployment topology for your needs shouldn't be a licensing transaction. (Don't even get me started on client licenses). regards, Carl Zetie Program Director Offering Management for Spectrum Scale, IBM ---- (540) 882 9353 ][ Research Triangle Park carlz at us.ibm.com From cblack at nygenome.org Wed Jan 9 19:11:40 2019 From: cblack at nygenome.org (Christopher Black) Date: Wed, 9 Jan 2019 19:11:40 +0000 Subject: [gpfsug-discuss] User Login Active Directory authentication on CES nodes with SMB protocol In-Reply-To: References: Message-ID: <7399F5C1-A23F-4852-B912-0965E111D191@nygenome.org> We use realmd and some automation for sssd configs to get linux hosts to have local login and ssh tied to AD accounts, however we do not apply these configs on our protocol nodes. From: on behalf of Christof Schmitt Reply-To: gpfsug main discussion list Date: Wednesday, January 9, 2019 at 2:03 PM To: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" , Ingo Meents Subject: Re: [gpfsug-discuss] User Login Active Directory authentication on CES nodes with SMB protocol There is the PAM module that would forward authentication requests to winbindd: /usr/lpp/mmfs/lib64/security/pam_gpfs-winbind.so In theory that can be added to the PAM configuration in /etc/pam.d/. On the other hand, we have never tested this nor claimed support, so there might be reasons why this won't work. Other customers have configured sssd manually in addition to the Scale authentication to allow user logon and authentication for sudo. If the request here is to configure AD authentication through mmuserauth and that should also provide user logon, that should probably be treated as a feature request through RFE. Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) ----- Original message ----- From: "Lyle Gayne" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Ingo Meents Subject: Re: [gpfsug-discuss] User Login Active Directory authentication on CES nodes with SMB protocol Date: Tue, Jan 8, 2019 2:54 PM Adding Ingo Meents for response [Inactive hide details for "Rob Logie" ---01/08/2019 04:50:22 PM---Hi All Is there a way to enable User Login Active Directory a]"Rob Logie" ---01/08/2019 04:50:22 PM---Hi All Is there a way to enable User Login Active Directory authentication on CES From: "Rob Logie" To: gpfsug-discuss at spectrumscale.org Date: 01/08/2019 04:50 PM Subject: [gpfsug-discuss] User Login Active Directory authentication on CES nodes with SMB protocol Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All Is there a way to enable User Login Active Directory authentication on CES nodes with SMB protocol that are joined to an AD domain. ? The AD authentication is working for access to the SMB shares, but not for user login authentication on the CES nodes. Thanks ! Regards, Rob Logie IT Specialist _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 106 bytes Desc: image001.gif URL: From Kevin.Buterbaugh at Vanderbilt.Edu Tue Jan 8 22:12:22 2019 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Tue, 8 Jan 2019 22:12:22 +0000 Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? Message-ID: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> Hi All, Happy New Year to all! Personally, I?ll gladly and gratefully settle for 2019 not being a dumpster fire like 2018 was (those who attended my talk at the user group meeting at SC18 know what I?m referring to), but I certainly wish all of you the best! Is there a way to get a list of the filesets in a filesystem without running mmlsfileset? I was kind of expecting to find them in one of the config files somewhere under /var/mmfs but haven?t found them yet in the searching I?ve done. The reason I?m asking is that we have a Python script that users can run that needs to get a list of all the filesets in a filesystem. There are obviously multiple issues with that, so the workaround we?re using for now is to have a cron job which runs mmlsfileset once a day and dumps it out to a text file, which the script then reads. That?s sub-optimal for any day on which a fileset gets created or deleted, so I?m looking for a better way ? one which doesn?t require root privileges and preferably doesn?t involve running a GPFS command at all. Thanks in advance. Kevin P.S. I am still working on metadata and iSCSI testing and will report back on that when complete. P.P.S. We ended up adding our new NSDs comprised of (not really) 12 TB disks to the capacity pool and things are working fine. ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From skylar2 at uw.edu Wed Jan 9 21:37:04 2019 From: skylar2 at uw.edu (Skylar Thompson) Date: Wed, 9 Jan 2019 21:37:04 +0000 Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? In-Reply-To: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> References: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> Message-ID: <20190109213704.bbbqbuqzkrotcjpu@utumno.gs.washington.edu> I suppose you could run the underlying tslsfileset, though that's probably not the answer you're looking for. Out of curiousity, what are you hoping to gain by not running mmlsfileset? Is the problem scaling due to the number of filesets that you have defined? On Tue, Jan 08, 2019 at 10:12:22PM +0000, Buterbaugh, Kevin L wrote: > Hi All, > > Happy New Year to all! Personally, I???ll gladly and gratefully settle for 2019 not being a dumpster fire like 2018 was (those who attended my talk at the user group meeting at SC18 know what I???m referring to), but I certainly wish all of you the best! > > Is there a way to get a list of the filesets in a filesystem without running mmlsfileset? I was kind of expecting to find them in one of the config files somewhere under /var/mmfs but haven???t found them yet in the searching I???ve done. > > The reason I???m asking is that we have a Python script that users can run that needs to get a list of all the filesets in a filesystem. There are obviously multiple issues with that, so the workaround we???re using for now is to have a cron job which runs mmlsfileset once a day and dumps it out to a text file, which the script then reads. That???s sub-optimal for any day on which a fileset gets created or deleted, so I???m looking for a better way ??? one which doesn???t require root privileges and preferably doesn???t involve running a GPFS command at all. > > Thanks in advance. > > Kevin > > P.S. I am still working on metadata and iSCSI testing and will report back on that when complete. > P.P.S. We ended up adding our new NSDs comprised of (not really) 12 TB disks to the capacity pool and things are working fine. > > ??? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From S.J.Thompson at bham.ac.uk Wed Jan 9 22:42:01 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Wed, 9 Jan 2019 22:42:01 +0000 Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? In-Reply-To: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> References: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> Message-ID: Hi Kevin, Have you looked at the rest API? https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/com.ibm.spectrum.scale.v5r02.doc/bl1adm_listofapicommands.htm I don't know how much access control there is available in the API so not sure if you could lock some sort of service user down to just the get filesets command? Simon _______________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Buterbaugh, Kevin L [Kevin.Buterbaugh at Vanderbilt.Edu] Sent: 08 January 2019 22:12 To: gpfsug main discussion list Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? Hi All, Happy New Year to all! Personally, I?ll gladly and gratefully settle for 2019 not being a dumpster fire like 2018 was (those who attended my talk at the user group meeting at SC18 know what I?m referring to), but I certainly wish all of you the best! Is there a way to get a list of the filesets in a filesystem without running mmlsfileset? I was kind of expecting to find them in one of the config files somewhere under /var/mmfs but haven?t found them yet in the searching I?ve done. The reason I?m asking is that we have a Python script that users can run that needs to get a list of all the filesets in a filesystem. There are obviously multiple issues with that, so the workaround we?re using for now is to have a cron job which runs mmlsfileset once a day and dumps it out to a text file, which the script then reads. That?s sub-optimal for any day on which a fileset gets created or deleted, so I?m looking for a better way ? one which doesn?t require root privileges and preferably doesn?t involve running a GPFS command at all. Thanks in advance. Kevin P.S. I am still working on metadata and iSCSI testing and will report back on that when complete. P.P.S. We ended up adding our new NSDs comprised of (not really) 12 TB disks to the capacity pool and things are working fine. ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 From Paul.Sanchez at deshaw.com Wed Jan 9 23:03:08 2019 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Wed, 9 Jan 2019 23:03:08 +0000 Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? In-Reply-To: <20190109213704.bbbqbuqzkrotcjpu@utumno.gs.washington.edu> References: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> <20190109213704.bbbqbuqzkrotcjpu@utumno.gs.washington.edu> Message-ID: <3d408800d50648dfae25c3c95c1f04c1@mbxtoa1.winmail.deshaw.com> You could also wrap whatever provisioning script you're using (the thing that runs mmcrfileset), which must already be running as root, so that it also updates the cached text file afterward. -Paul -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Skylar Thompson Sent: Wednesday, January 9, 2019 4:37 PM To: Kevin.Buterbaugh at Vanderbilt.Edu Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? I suppose you could run the underlying tslsfileset, though that's probably not the answer you're looking for. Out of curiousity, what are you hoping to gain by not running mmlsfileset? Is the problem scaling due to the number of filesets that you have defined? On Tue, Jan 08, 2019 at 10:12:22PM +0000, Buterbaugh, Kevin L wrote: > Hi All, > > Happy New Year to all! Personally, I???ll gladly and gratefully settle for 2019 not being a dumpster fire like 2018 was (those who attended my talk at the user group meeting at SC18 know what I???m referring to), but I certainly wish all of you the best! > > Is there a way to get a list of the filesets in a filesystem without running mmlsfileset? I was kind of expecting to find them in one of the config files somewhere under /var/mmfs but haven???t found them yet in the searching I???ve done. > > The reason I???m asking is that we have a Python script that users can run that needs to get a list of all the filesets in a filesystem. There are obviously multiple issues with that, so the workaround we???re using for now is to have a cron job which runs mmlsfileset once a day and dumps it out to a text file, which the script then reads. That???s sub-optimal for any day on which a fileset gets created or deleted, so I???m looking for a better way ??? one which doesn???t require root privileges and preferably doesn???t involve running a GPFS command at all. > > Thanks in advance. > > Kevin > > P.S. I am still working on metadata and iSCSI testing and will report back on that when complete. > P.P.S. We ended up adding our new NSDs comprised of (not really) 12 TB disks to the capacity pool and things are working fine. > > ??? > Kevin Buterbaugh - Senior System Administrator Vanderbilt University - > Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu > - (615)875-9633 > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Kevin.Buterbaugh at Vanderbilt.Edu Wed Jan 9 23:07:00 2019 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 9 Jan 2019 23:07:00 +0000 Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? In-Reply-To: References: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> Message-ID: Hi All, Let me answer Skylar?s questions in another e-mail, which may also tell whether the rest API is a possibility or not. The Python script in question is to display quota information for a user. The mmlsquota command has a couple of issues: 1) its output is confusing to some of our users, 2) more significantly, it displays a ton of information that doesn?t apply to the user running it. For example, it will display all the filesets in a filesystem whether or not the user has access to them. So the Python script figures out what group(s) the user is a member of and only displays information pertinent to them (i.e. the group of the fileset junction path is a group this user is a member of) ? and in a simplified (and potentially colorized) output format. And typing that preceding paragraph caused the lightbulb to go off ? I know the answer to my own question ? have the script run mmlsquota and get the full list of filesets from that, then parse that to determine which ones I actually need to display quota information for. Thanks! Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Jan 9, 2019, at 4:42 PM, Simon Thompson > wrote: Hi Kevin, Have you looked at the rest API? https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSTXKQY_5.0.2%2Fcom.ibm.spectrum.scale.v5r02.doc%2Fbl1adm_listofapicommands.htm&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C36fb451ce9a945f5e0cb08d67683af85%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636826705300525885&sdata=uotWilntiZa2E9RIBE2ikhxxBm3Mk3y%2FW%2FKUHovaJpY%3D&reserved=0 I don't know how much access control there is available in the API so not sure if you could lock some sort of service user down to just the get filesets command? Simon _______________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Buterbaugh, Kevin L [Kevin.Buterbaugh at Vanderbilt.Edu] Sent: 08 January 2019 22:12 To: gpfsug main discussion list Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? Hi All, Happy New Year to all! Personally, I?ll gladly and gratefully settle for 2019 not being a dumpster fire like 2018 was (those who attended my talk at the user group meeting at SC18 know what I?m referring to), but I certainly wish all of you the best! Is there a way to get a list of the filesets in a filesystem without running mmlsfileset? I was kind of expecting to find them in one of the config files somewhere under /var/mmfs but haven?t found them yet in the searching I?ve done. The reason I?m asking is that we have a Python script that users can run that needs to get a list of all the filesets in a filesystem. There are obviously multiple issues with that, so the workaround we?re using for now is to have a cron job which runs mmlsfileset once a day and dumps it out to a text file, which the script then reads. That?s sub-optimal for any day on which a fileset gets created or deleted, so I?m looking for a better way ? one which doesn?t require root privileges and preferably doesn?t involve running a GPFS command at all. Thanks in advance. Kevin P.S. I am still working on metadata and iSCSI testing and will report back on that when complete. P.P.S. We ended up adding our new NSDs comprised of (not really) 12 TB disks to the capacity pool and things are working fine. ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C36fb451ce9a945f5e0cb08d67683af85%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636826705300525885&sdata=WSijRrjhOgQyuWsh9K8ckpjf%2F2CkXfZW1n%2BJw5Gw5tw%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Thu Jan 10 01:13:55 2019 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Thu, 10 Jan 2019 01:13:55 +0000 Subject: [gpfsug-discuss] Get list of filesets _without_runningmmlsfileset? In-Reply-To: References: , <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> Message-ID: An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Jan 10 20:42:50 2019 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 10 Jan 2019 20:42:50 +0000 Subject: [gpfsug-discuss] Get list of filesets _without_runningmmlsfileset? In-Reply-To: References: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> Message-ID: <6A909228-87E7-468E-A51C-086B9C75BB18@vanderbilt.edu> Hi Andrew / All, Well, it does _sound_ useful, but in its current state it?s really not for several reasons, mainly having to do with it being coded in a moderately site-specific way. It needs an overhaul anyway, so I?m going to look at getting rid of as much of that as possible (there?s some definite low-hanging fruit there) and, for the site-specific things that can?t be gotten rid of, maybe consolidating them into one place in the code so that the script could be more generally useful if you just change those values. If I can accomplish those things, then yes, we?d be glad to share the script. But I?ve also realized that I didn?t _entirely_ answer my original question. Yes, mmlsquota will show me all the filesets ? but I also need to know the junction path for each of those filesets. One of the main reasons we wrote this script in the first place is that if you run mmlsquota you see that you have no limits on about 60 filesets (currently we use fileset quotas only on our filesets) ? and that?s because there are no user (or group) quotas in those filesets. The script, however, reads in that text file that is created nightly by root that is nothing more than the output of ?mmlsfileset ?, gets the junction path, looks up the GID of the junction path, and sees if you?re a member of that group. If you?re not, well, no sense in showing you anything about that fileset. But, of course, if you are a member of that group, then we do want to show you the fileset quota for that fileset. So ? my question now is, ?Is there a way for a non-root user? to get the junction path for the fileset(s)? Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Jan 9, 2019, at 7:13 PM, Andrew Beattie > wrote: Kevin, That sounds like a useful script would you care to share? Thanks Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Buterbaugh, Kevin L" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: Re: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? Date: Thu, Jan 10, 2019 9:22 AM Hi All, Let me answer Skylar?s questions in another e-mail, which may also tell whether the rest API is a possibility or not. The Python script in question is to display quota information for a user. The mmlsquota command has a couple of issues: 1) its output is confusing to some of our users, 2) more significantly, it displays a ton of information that doesn?t apply to the user running it. For example, it will display all the filesets in a filesystem whether or not the user has access to them. So the Python script figures out what group(s) the user is a member of and only displays information pertinent to them (i.e. the group of the fileset junction path is a group this user is a member of) ? and in a simplified (and potentially colorized) output format. And typing that preceding paragraph caused the lightbulb to go off ? I know the answer to my own question ? have the script run mmlsquota and get the full list of filesets from that, then parse that to determine which ones I actually need to display quota information for. Thanks! Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Jan 9, 2019, at 4:42 PM, Simon Thompson > wrote: Hi Kevin, Have you looked at the rest API? https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSTXKQY_5.0.2%2Fcom.ibm.spectrum.scale.v5r02.doc%2Fbl1adm_listofapicommands.htm&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C36fb451ce9a945f5e0cb08d67683af85%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636826705300525885&sdata=uotWilntiZa2E9RIBE2ikhxxBm3Mk3y%2FW%2FKUHovaJpY%3D&reserved=0 I don't know how much access control there is available in the API so not sure if you could lock some sort of service user down to just the get filesets command? Simon _______________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Buterbaugh, Kevin L [Kevin.Buterbaugh at Vanderbilt.Edu] Sent: 08 January 2019 22:12 To: gpfsug main discussion list Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? Hi All, Happy New Year to all! Personally, I?ll gladly and gratefully settle for 2019 not being a dumpster fire like 2018 was (those who attended my talk at the user group meeting at SC18 know what I?m referring to), but I certainly wish all of you the best! Is there a way to get a list of the filesets in a filesystem without running mmlsfileset? I was kind of expecting to find them in one of the config files somewhere under /var/mmfs but haven?t found them yet in the searching I?ve done. The reason I?m asking is that we have a Python script that users can run that needs to get a list of all the filesets in a filesystem. There are obviously multiple issues with that, so the workaround we?re using for now is to have a cron job which runs mmlsfileset once a day and dumps it out to a text file, which the script then reads. That?s sub-optimal for any day on which a fileset gets created or deleted, so I?m looking for a better way ? one which doesn?t require root privileges and preferably doesn?t involve running a GPFS command at all. Thanks in advance. Kevin P.S. I am still working on metadata and iSCSI testing and will report back on that when complete. P.P.S. We ended up adding our new NSDs comprised of (not really) 12 TB disks to the capacity pool and things are working fine. ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C36fb451ce9a945f5e0cb08d67683af85%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636826705300525885&sdata=WSijRrjhOgQyuWsh9K8ckpjf%2F2CkXfZW1n%2BJw5Gw5tw%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cc1ffac821c5f4524104908d67698e948%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636826796467009700&sdata=Xfz4JiItI8ukHgnvO5YoN27jVpk6Ngsk03NtMrKJcHk%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.childs at qmul.ac.uk Fri Jan 11 12:50:17 2019 From: p.childs at qmul.ac.uk (Peter Childs) Date: Fri, 11 Jan 2019 12:50:17 +0000 Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? In-Reply-To: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> References: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> Message-ID: <300e6e1f8fb4daf9278f0b67262c4046127e0bbf.camel@qmul.ac.uk> We have a similar issue, I'm wondering if getting mmlsfileset to work as a user is a reasonable "request for enhancement" I suspect it would need better wording. We too have a rather complex script to report on quota's that I suspect does a similar job. It works by having all the filesets mounted in known locations and names matching mount point names. It then works out which ones are needed by looking at the group ownership, Its very slow and a little cumbersome. Not least because it was written ages ago in a mix of bash, sed, awk and find. On Tue, 2019-01-08 at 22:12 +0000, Buterbaugh, Kevin L wrote: Hi All, Happy New Year to all! Personally, I?ll gladly and gratefully settle for 2019 not being a dumpster fire like 2018 was (those who attended my talk at the user group meeting at SC18 know what I?m referring to), but I certainly wish all of you the best! Is there a way to get a list of the filesets in a filesystem without running mmlsfileset? I was kind of expecting to find them in one of the config files somewhere under /var/mmfs but haven?t found them yet in the searching I?ve done. The reason I?m asking is that we have a Python script that users can run that needs to get a list of all the filesets in a filesystem. There are obviously multiple issues with that, so the workaround we?re using for now is to have a cron job which runs mmlsfileset once a day and dumps it out to a text file, which the script then reads. That?s sub-optimal for any day on which a fileset gets created or deleted, so I?m looking for a better way ? one which doesn?t require root privileges and preferably doesn?t involve running a GPFS command at all. Thanks in advance. Kevin P.S. I am still working on metadata and iSCSI testing and will report back on that when complete. P.P.S. We ended up adding our new NSDs comprised of (not really) 12 TB disks to the capacity pool and things are working fine. ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Jan 11 14:19:28 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 11 Jan 2019 14:19:28 +0000 Subject: [gpfsug-discuss] A cautionary tale of upgrades Message-ID: <7085B14A-92B4-47DE-9411-FB544FD4A610@bham.ac.uk> I?ll start by saying this is our experience, maybe we did something stupid along the way, but just in case others see similar issues ? We have a cluster which contains protocol nodes, these were all happily running GPFS 5.0.1-2 code. But the cluster was a only 4 nodes + 1 quorum node ? manager and quorum functions were handled by the 4 protocol nodes. Then one day we needed to reboot a protocol node. We did so and its disk controller appeared to have failed. Oh well, we thought we?ll fix that another day, we still have three other quorum nodes. As they are all getting a little long in the tooth and were starting to struggle, we thought, well we have DME, lets add some new nodes for quorum and token functions. Being shiny and new they were all installed with GPFS 5.0.2-1 code. All was well. The some-time later, we needed to restart another of the CES nodes, when we started GPFS on the node, it was causing havock in our cluster ? CES IPs were constantly being assigned, then removed from the remaining nodes in the cluster. Crap we thought and disabled the node in the cluster. This made things stabilise and as we?d been having other GPFS issues, we didn?t want service to be interrupted whilst we dug into this. Besides, it was nearly Christmas and we had conferences and other work to content with. More time passes and we?re about to cut over all our backend storage to some shiny new DSS-G kit, so we plan a whole system maintenance window. We finish all our data sync?s and then try to start our protocol nodes to test them. No dice ? we can?t get any of the nodes to bring up IPs, the logs look like they start the assignment process, but then gave up. A lot of digging in the mm korn shell scripts, and some studious use of DEBUG=1 when testing, we find that mmcesnetmvaddress is calling ?tsctl shownodes up?. On our protocol nodes, we find output of the form: bear-er-dtn01.bb2.cluster.cluster,rds-aw-ctdb01-data.bb2.cluster.cluster,rds-er-ctdb01-data.bb2.cluster.cluster,bber-irods-ires01-data.bb2.cluster.cluster,bber-irods-icat01-data.bb2.cluster.cluster,bbaw-irods-icat01-data.bb2.cluster.cluster,proto-pg-mgr01.bear.cluster.cluster,proto-pg-pf01.bear.cluster.cluster,proto-pg-dtn01.bear.cluster.cluster,proto-er-mgr01.bear.cluster.cluster,proto-er-pf01.bear.cluster.cluster,proto-aw-mgr01.bear.cluster.cluster,proto-aw-pf01.bear.cluster.cluster Now our DNS name for these nodes is bb2.cluster ? something is repeating the DNS name. So we dig around, resolv.conf, /etc/hosts etc all look good and name resolution seems fine. We look around on the manager/quorum nodes and they don?t do this cluster.cluster thing. We can?t find anything else Linux config wise that looks bad. In fact the only difference is that our CES nodes are running 5.0.1-2 and the manager nodes 5.0.2-1. Given we?re changing the whole storage hardware, we didn?t want to change the GPFS/NFS/SMB code on the CES nodes, (we?ve been bitten before with SMB packages not working properly in our environment), but we go ahead and do GPFS and NFS packages. Suddenly, magically all is working again. CES starts fine and IPs get assigned OK. And tsctl gives the correct output. So, my supposition is that there is some incompatibility between 5.0.1-2 and 5.0.2-1 when running CES and the cluster manager is running on 5.0.2-1. As I said before, I don?t have hard evidence we did something stupid, but it certainly is fishy. We?re guessing this same ?feature? was the cause of the CES issues we saw when we rebooted a CES node and the IPs kept deassigning? It looks like all was well as we added the manager nodes after CES was started, but when a CES node restarted, things broke. We got everything working again in house so didn?t raise a PMR, but if you find yourself in this upgrade path, beware! Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From MDIETZ at de.ibm.com Fri Jan 11 14:58:20 2019 From: MDIETZ at de.ibm.com (Mathias Dietz) Date: Fri, 11 Jan 2019 15:58:20 +0100 Subject: [gpfsug-discuss] A cautionary tale of upgrades In-Reply-To: <7085B14A-92B4-47DE-9411-FB544FD4A610@bham.ac.uk> References: <7085B14A-92B4-47DE-9411-FB544FD4A610@bham.ac.uk> Message-ID: Hi Simon, you likely run into the following issue: APAR IV93896 - https://www-01.ibm.com/support/docview.wss?uid=isg1IV93896 This problem happens only if you use different host domains within a cluster and will mostly impact CES. It is unrelated to upgrade or mixed version clusters. Its has been fixed with 5.0.2, therefore I recommend to upgrade soon. Mit freundlichen Gr??en / Kind regards Mathias Dietz Spectrum Scale Development - Release Lead Architect (4.2.x) Spectrum Scale RAS Architect --------------------------------------------------------------------------- IBM Deutschland Am Weiher 24 65451 Kelsterbach Phone: +49 70342744105 Mobile: +49-15152801035 E-Mail: mdietz at de.ibm.com ----------------------------------------------------------------------------- IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martina Koederitz, Gesch?ftsf?hrung: Dirk WittkoppSitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 11/01/2019 15:19 Subject: [gpfsug-discuss] A cautionary tale of upgrades Sent by: gpfsug-discuss-bounces at spectrumscale.org I?ll start by saying this is our experience, maybe we did something stupid along the way, but just in case others see similar issues ? We have a cluster which contains protocol nodes, these were all happily running GPFS 5.0.1-2 code. But the cluster was a only 4 nodes + 1 quorum node ? manager and quorum functions were handled by the 4 protocol nodes. Then one day we needed to reboot a protocol node. We did so and its disk controller appeared to have failed. Oh well, we thought we?ll fix that another day, we still have three other quorum nodes. As they are all getting a little long in the tooth and were starting to struggle, we thought, well we have DME, lets add some new nodes for quorum and token functions. Being shiny and new they were all installed with GPFS 5.0.2-1 code. All was well. The some-time later, we needed to restart another of the CES nodes, when we started GPFS on the node, it was causing havock in our cluster ? CES IPs were constantly being assigned, then removed from the remaining nodes in the cluster. Crap we thought and disabled the node in the cluster. This made things stabilise and as we?d been having other GPFS issues, we didn?t want service to be interrupted whilst we dug into this. Besides, it was nearly Christmas and we had conferences and other work to content with. More time passes and we?re about to cut over all our backend storage to some shiny new DSS-G kit, so we plan a whole system maintenance window. We finish all our data sync?s and then try to start our protocol nodes to test them. No dice ? we can?t get any of the nodes to bring up IPs, the logs look like they start the assignment process, but then gave up. A lot of digging in the mm korn shell scripts, and some studious use of DEBUG=1 when testing, we find that mmcesnetmvaddress is calling ?tsctl shownodes up?. On our protocol nodes, we find output of the form: bear-er-dtn01.bb2.cluster.cluster,rds-aw-ctdb01-data.bb2.cluster.cluster,rds-er-ctdb01-data.bb2.cluster.cluster,bber-irods-ires01-data.bb2.cluster.cluster,bber-irods-icat01-data.bb2.cluster.cluster,bbaw-irods-icat01-data.bb2.cluster.cluster,proto-pg-mgr01.bear.cluster.cluster,proto-pg-pf01.bear.cluster.cluster,proto-pg-dtn01.bear.cluster.cluster,proto-er-mgr01.bear.cluster.cluster,proto-er-pf01.bear.cluster.cluster,proto-aw-mgr01.bear.cluster.cluster,proto-aw-pf01.bear.cluster.cluster Now our DNS name for these nodes is bb2.cluster ? something is repeating the DNS name. So we dig around, resolv.conf, /etc/hosts etc all look good and name resolution seems fine. We look around on the manager/quorum nodes and they don?t do this cluster.cluster thing. We can?t find anything else Linux config wise that looks bad. In fact the only difference is that our CES nodes are running 5.0.1-2 and the manager nodes 5.0.2-1. Given we?re changing the whole storage hardware, we didn?t want to change the GPFS/NFS/SMB code on the CES nodes, (we?ve been bitten before with SMB packages not working properly in our environment), but we go ahead and do GPFS and NFS packages. Suddenly, magically all is working again. CES starts fine and IPs get assigned OK. And tsctl gives the correct output. So, my supposition is that there is some incompatibility between 5.0.1-2 and 5.0.2-1 when running CES and the cluster manager is running on 5.0.2-1. As I said before, I don?t have hard evidence we did something stupid, but it certainly is fishy. We?re guessing this same ?feature? was the cause of the CES issues we saw when we rebooted a CES node and the IPs kept deassigning? It looks like all was well as we added the manager nodes after CES was started, but when a CES node restarted, things broke. We got everything working again in house so didn?t raise a PMR, but if you find yourself in this upgrade path, beware! Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Fri Jan 11 15:00:51 2019 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Fri, 11 Jan 2019 15:00:51 +0000 Subject: [gpfsug-discuss] A cautionary tale of upgrades In-Reply-To: <7085B14A-92B4-47DE-9411-FB544FD4A610@bham.ac.uk> References: <7085B14A-92B4-47DE-9411-FB544FD4A610@bham.ac.uk> Message-ID: Hallo Simon, Welcome to the Club. These behavior are a Bug in tsctl to change the DNS names . We had this already 4 weeks ago. The fix was Update to 5.0.2.1. Regards Renar Von meinem iPhone gesendet Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Am 11.01.2019 um 15:19 schrieb Simon Thompson >: I?ll start by saying this is our experience, maybe we did something stupid along the way, but just in case others see similar issues ? We have a cluster which contains protocol nodes, these were all happily running GPFS 5.0.1-2 code. But the cluster was a only 4 nodes + 1 quorum node ? manager and quorum functions were handled by the 4 protocol nodes. Then one day we needed to reboot a protocol node. We did so and its disk controller appeared to have failed. Oh well, we thought we?ll fix that another day, we still have three other quorum nodes. As they are all getting a little long in the tooth and were starting to struggle, we thought, well we have DME, lets add some new nodes for quorum and token functions. Being shiny and new they were all installed with GPFS 5.0.2-1 code. All was well. The some-time later, we needed to restart another of the CES nodes, when we started GPFS on the node, it was causing havock in our cluster ? CES IPs were constantly being assigned, then removed from the remaining nodes in the cluster. Crap we thought and disabled the node in the cluster. This made things stabilise and as we?d been having other GPFS issues, we didn?t want service to be interrupted whilst we dug into this. Besides, it was nearly Christmas and we had conferences and other work to content with. More time passes and we?re about to cut over all our backend storage to some shiny new DSS-G kit, so we plan a whole system maintenance window. We finish all our data sync?s and then try to start our protocol nodes to test them. No dice ? we can?t get any of the nodes to bring up IPs, the logs look like they start the assignment process, but then gave up. A lot of digging in the mm korn shell scripts, and some studious use of DEBUG=1 when testing, we find that mmcesnetmvaddress is calling ?tsctl shownodes up?. On our protocol nodes, we find output of the form: bear-er-dtn01.bb2.cluster.cluster,rds-aw-ctdb01-data.bb2.cluster.cluster,rds-er-ctdb01-data.bb2.cluster.cluster,bber-irods-ires01-data.bb2.cluster.cluster,bber-irods-icat01-data.bb2.cluster.cluster,bbaw-irods-icat01-data.bb2.cluster.cluster,proto-pg-mgr01.bear.cluster.cluster,proto-pg-pf01.bear.cluster.cluster,proto-pg-dtn01.bear.cluster.cluster,proto-er-mgr01.bear.cluster.cluster,proto-er-pf01.bear.cluster.cluster,proto-aw-mgr01.bear.cluster.cluster,proto-aw-pf01.bear.cluster.cluster Now our DNS name for these nodes is bb2.cluster ? something is repeating the DNS name. So we dig around, resolv.conf, /etc/hosts etc all look good and name resolution seems fine. We look around on the manager/quorum nodes and they don?t do this cluster.cluster thing. We can?t find anything else Linux config wise that looks bad. In fact the only difference is that our CES nodes are running 5.0.1-2 and the manager nodes 5.0.2-1. Given we?re changing the whole storage hardware, we didn?t want to change the GPFS/NFS/SMB code on the CES nodes, (we?ve been bitten before with SMB packages not working properly in our environment), but we go ahead and do GPFS and NFS packages. Suddenly, magically all is working again. CES starts fine and IPs get assigned OK. And tsctl gives the correct output. So, my supposition is that there is some incompatibility between 5.0.1-2 and 5.0.2-1 when running CES and the cluster manager is running on 5.0.2-1. As I said before, I don?t have hard evidence we did something stupid, but it certainly is fishy. We?re guessing this same ?feature? was the cause of the CES issues we saw when we rebooted a CES node and the IPs kept deassigning? It looks like all was well as we added the manager nodes after CES was started, but when a CES node restarted, things broke. We got everything working again in house so didn?t raise a PMR, but if you find yourself in this upgrade path, beware! Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Jan 11 15:48:50 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 11 Jan 2019 15:48:50 +0000 Subject: [gpfsug-discuss] A cautionary tale of upgrades In-Reply-To: References: <7085B14A-92B4-47DE-9411-FB544FD4A610@bham.ac.uk>, Message-ID: Could well be. Still it's pretty scary that this sort of thing could hit you way after the different DNS name nodes were added. It might be months before you restart the CES nodes. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of MDIETZ at de.ibm.com [MDIETZ at de.ibm.com] Sent: 11 January 2019 14:58 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] A cautionary tale of upgrades Hi Simon, you likely run into the following issue: APAR IV93896 - https://www-01.ibm.com/support/docview.wss?uid=isg1IV93896 This problem happens only if you use different host domains within a cluster and will mostly impact CES. It is unrelated to upgrade or mixed version clusters. Its has been fixed with 5.0.2, therefore I recommend to upgrade soon. Mit freundlichen Gr??en / Kind regards Mathias Dietz Spectrum Scale Development - Release Lead Architect (4.2.x) Spectrum Scale RAS Architect --------------------------------------------------------------------------- IBM Deutschland Am Weiher 24 65451 Kelsterbach Phone: +49 70342744105 Mobile: +49-15152801035 E-Mail: mdietz at de.ibm.com ----------------------------------------------------------------------------- IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martina Koederitz, Gesch?ftsf?hrung: Dirk WittkoppSitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 11/01/2019 15:19 Subject: [gpfsug-discuss] A cautionary tale of upgrades Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I?ll start by saying this is our experience, maybe we did something stupid along the way, but just in case others see similar issues ? We have a cluster which contains protocol nodes, these were all happily running GPFS 5.0.1-2 code. But the cluster was a only 4 nodes + 1 quorum node ? manager and quorum functions were handled by the 4 protocol nodes. Then one day we needed to reboot a protocol node. We did so and its disk controller appeared to have failed. Oh well, we thought we?ll fix that another day, we still have three other quorum nodes. As they are all getting a little long in the tooth and were starting to struggle, we thought, well we have DME, lets add some new nodes for quorum and token functions. Being shiny and new they were all installed with GPFS 5.0.2-1 code. All was well. The some-time later, we needed to restart another of the CES nodes, when we started GPFS on the node, it was causing havock in our cluster ? CES IPs were constantly being assigned, then removed from the remaining nodes in the cluster. Crap we thought and disabled the node in the cluster. This made things stabilise and as we?d been having other GPFS issues, we didn?t want service to be interrupted whilst we dug into this. Besides, it was nearly Christmas and we had conferences and other work to content with. More time passes and we?re about to cut over all our backend storage to some shiny new DSS-G kit, so we plan a whole system maintenance window. We finish all our data sync?s and then try to start our protocol nodes to test them. No dice ? we can?t get any of the nodes to bring up IPs, the logs look like they start the assignment process, but then gave up. A lot of digging in the mm korn shell scripts, and some studious use of DEBUG=1 when testing, we find that mmcesnetmvaddress is calling ?tsctl shownodes up?. On our protocol nodes, we find output of the form: bear-er-dtn01.bb2.cluster.cluster,rds-aw-ctdb01-data.bb2.cluster.cluster,rds-er-ctdb01-data.bb2.cluster.cluster,bber-irods-ires01-data.bb2.cluster.cluster,bber-irods-icat01-data.bb2.cluster.cluster,bbaw-irods-icat01-data.bb2.cluster.cluster,proto-pg-mgr01.bear.cluster.cluster,proto-pg-pf01.bear.cluster.cluster,proto-pg-dtn01.bear.cluster.cluster,proto-er-mgr01.bear.cluster.cluster,proto-er-pf01.bear.cluster.cluster,proto-aw-mgr01.bear.cluster.cluster,proto-aw-pf01.bear.cluster.cluster Now our DNS name for these nodes is bb2.cluster ? something is repeating the DNS name. So we dig around, resolv.conf, /etc/hosts etc all look good and name resolution seems fine. We look around on the manager/quorum nodes and they don?t do this cluster.cluster thing. We can?t find anything else Linux config wise that looks bad. In fact the only difference is that our CES nodes are running 5.0.1-2 and the manager nodes 5.0.2-1. Given we?re changing the whole storage hardware, we didn?t want to change the GPFS/NFS/SMB code on the CES nodes, (we?ve been bitten before with SMB packages not working properly in our environment), but we go ahead and do GPFS and NFS packages. Suddenly, magically all is working again. CES starts fine and IPs get assigned OK. And tsctl gives the correct output. So, my supposition is that there is some incompatibility between 5.0.1-2 and 5.0.2-1 when running CES and the cluster manager is running on 5.0.2-1. As I said before, I don?t have hard evidence we did something stupid, but it certainly is fishy. We?re guessing this same ?feature? was the cause of the CES issues we saw when we rebooted a CES node and the IPs kept deassigning? It looks like all was well as we added the manager nodes after CES was started, but when a CES node restarted, things broke. We got everything working again in house so didn?t raise a PMR, but if you find yourself in this upgrade path, beware! Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From makaplan at us.ibm.com Fri Jan 11 17:31:35 2019 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Fri, 11 Jan 2019 14:31:35 -0300 Subject: [gpfsug-discuss] Get list offilesets_without_runningmmlsfileset? In-Reply-To: <6A909228-87E7-468E-A51C-086B9C75BB18@vanderbilt.edu> References: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> <6A909228-87E7-468E-A51C-086B9C75BB18@vanderbilt.edu> Message-ID: ?Is there a way for a non-root user? to get the junction path for the fileset(s)? Presuming the user has some path to some file in the fileset... Issue `mmlsattr -L path` then "walk" back towards the root by discarding successive path suffixes and watch for changes in the fileset name field Why doesn't mmlsfileset work for non-root users? I don't know. Perhaps the argument has to do with security or confidentiality. On my test system it gives a bogus error, when it should say something about root or super-user. -------------- next part -------------- An HTML attachment was scrubbed... URL: From JRLang at uwyo.edu Fri Jan 11 16:24:17 2019 From: JRLang at uwyo.edu (Jeffrey R. Lang) Date: Fri, 11 Jan 2019 16:24:17 +0000 Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? In-Reply-To: <300e6e1f8fb4daf9278f0b67262c4046127e0bbf.camel@qmul.ac.uk> References: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> <300e6e1f8fb4daf9278f0b67262c4046127e0bbf.camel@qmul.ac.uk> Message-ID: What we do is the use ?mmlsquota -Y ? which will list out all the filesets in an easily parseable format. And the command can be run by the user. From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Peter Childs Sent: Friday, January 11, 2019 6:50 AM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? ? This message was sent from a non-UWYO address. Please exercise caution when clicking links or opening attachments from external sources. We have a similar issue, I'm wondering if getting mmlsfileset to work as a user is a reasonable "request for enhancement" I suspect it would need better wording. We too have a rather complex script to report on quota's that I suspect does a similar job. It works by having all the filesets mounted in known locations and names matching mount point names. It then works out which ones are needed by looking at the group ownership, Its very slow and a little cumbersome. Not least because it was written ages ago in a mix of bash, sed, awk and find. On Tue, 2019-01-08 at 22:12 +0000, Buterbaugh, Kevin L wrote: Hi All, Happy New Year to all! Personally, I?ll gladly and gratefully settle for 2019 not being a dumpster fire like 2018 was (those who attended my talk at the user group meeting at SC18 know what I?m referring to), but I certainly wish all of you the best! Is there a way to get a list of the filesets in a filesystem without running mmlsfileset? I was kind of expecting to find them in one of the config files somewhere under /var/mmfs but haven?t found them yet in the searching I?ve done. The reason I?m asking is that we have a Python script that users can run that needs to get a list of all the filesets in a filesystem. There are obviously multiple issues with that, so the workaround we?re using for now is to have a cron job which runs mmlsfileset once a day and dumps it out to a text file, which the script then reads. That?s sub-optimal for any day on which a fileset gets created or deleted, so I?m looking for a better way ? one which doesn?t require root privileges and preferably doesn?t involve running a GPFS command at all. Thanks in advance. Kevin P.S. I am still working on metadata and iSCSI testing and will report back on that when complete. P.P.S. We ended up adding our new NSDs comprised of (not really) 12 TB disks to the capacity pool and things are working fine. ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Sat Jan 12 03:07:29 2019 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Sat, 12 Jan 2019 03:07:29 +0000 Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? In-Reply-To: References: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> <300e6e1f8fb4daf9278f0b67262c4046127e0bbf.camel@qmul.ac.uk> Message-ID: <1CD7EBDE-F39D-4410-9028-EF9FBF22C6EC@vanderbilt.edu> Hi All, I appreciate the time several of you have taken to respond to my inquiry. However, unless I?m missing something - and my apologies if I am - none so far appear to allow me to obtain the list of junction paths as a non-root user. Yes, mmlsquota shows all the filesets. But from there I need to then be able to find out where that fileset is mounted in the directory tree so that I can see who the owner and group of that directory are. Only if the user running the script is either the owner or a member of the group do I want to display the fileset quota for that fileset to the user. Thanks again? Kevin On Jan 11, 2019, at 10:24 AM, Jeffrey R. Lang > wrote: What we do is the use ?mmlsquota -Y ? which will list out all the filesets in an easily parseable format. And the command can be run by the user. From: gpfsug-discuss-bounces at spectrumscale.org > On Behalf Of Peter Childs Sent: Friday, January 11, 2019 6:50 AM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? ? This message was sent from a non-UWYO address. Please exercise caution when clicking links or opening attachments from external sources. We have a similar issue, I'm wondering if getting mmlsfileset to work as a user is a reasonable "request for enhancement" I suspect it would need better wording. We too have a rather complex script to report on quota's that I suspect does a similar job. It works by having all the filesets mounted in known locations and names matching mount point names. It then works out which ones are needed by looking at the group ownership, Its very slow and a little cumbersome. Not least because it was written ages ago in a mix of bash, sed, awk and find. On Tue, 2019-01-08 at 22:12 +0000, Buterbaugh, Kevin L wrote: Hi All, Happy New Year to all! Personally, I?ll gladly and gratefully settle for 2019 not being a dumpster fire like 2018 was (those who attended my talk at the user group meeting at SC18 know what I?m referring to), but I certainly wish all of you the best! Is there a way to get a list of the filesets in a filesystem without running mmlsfileset? I was kind of expecting to find them in one of the config files somewhere under /var/mmfs but haven?t found them yet in the searching I?ve done. The reason I?m asking is that we have a Python script that users can run that needs to get a list of all the filesets in a filesystem. There are obviously multiple issues with that, so the workaround we?re using for now is to have a cron job which runs mmlsfileset once a day and dumps it out to a text file, which the script then reads. That?s sub-optimal for any day on which a fileset gets created or deleted, so I?m looking for a better way ? one which doesn?t require root privileges and preferably doesn?t involve running a GPFS command at all. Thanks in advance. Kevin P.S. I am still working on metadata and iSCSI testing and will report back on that when complete. P.P.S. We ended up adding our new NSDs comprised of (not really) 12 TB disks to the capacity pool and things are working fine. ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cee10c1e22a474fedceb408d678318231%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636828551398056004&sdata=F56RKhMef0zYjAj2dKFu3bAuq7xQvFoulYhwDnfN1Ms%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Sat Jan 12 20:42:42 2019 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Sat, 12 Jan 2019 15:42:42 -0500 Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? In-Reply-To: <1CD7EBDE-F39D-4410-9028-EF9FBF22C6EC@vanderbilt.edu> References: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> <300e6e1f8fb4daf9278f0b67262c4046127e0bbf.camel@qmul.ac.uk> <1CD7EBDE-F39D-4410-9028-EF9FBF22C6EC@vanderbilt.edu> Message-ID: <13713.1547325762@turing-police.cc.vt.edu> On Sat, 12 Jan 2019 03:07:29 +0000, "Buterbaugh, Kevin L" said: > But from there I need to then be able to find out where that fileset is > mounted in the directory tree so that I can see who the owner and group of that > directory are. You're not able to leverage a local naming scheme? There's no connection between the name of the fileset and where it is in the tree? I would hope there is, because otherwise when your tool says 'Fileset ag5eg19 is over quota', your poor user will now be confused over what director(y/ies) need to be cleaned up. If your tool says 'Fileset foo_bar_baz is over quota' and the user knows that's mounted at /gpfs/foo/bar/baz then it's actionable. And if the user knows what the mapping is, your script can know it too.... From scottg at emailhosting.com Mon Jan 14 04:09:57 2019 From: scottg at emailhosting.com (Scott Goldman) Date: Sun, 13 Jan 2019 23:09:57 -0500 Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? In-Reply-To: <13713.1547325762@turing-police.cc.vt.edu> Message-ID: Kevin, Something I've done in the past is to create a service that once an hour/day/week that would build a static file that consists of the needed output. As long as you can take the update delay (or perhaps trigger the update with a callback), this should work and could actually be lighter on the system. Sent from my BlackBerry - the most secure mobile device ? Original Message ? From: valdis.kletnieks at vt.edu Sent: January 12, 2019 4:07 PM To: gpfsug-discuss at spectrumscale.org Reply-to: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? On Sat, 12 Jan 2019 03:07:29 +0000, "Buterbaugh, Kevin L" said: > But from there I need to then be able to find out where that fileset is > mounted in the directory tree so that I can see who the owner and group of that > directory are. You're not able to leverage a local naming scheme? There's no connection between the name of the fileset and where it is in the tree?? I would hope there is, because otherwise when your tool says 'Fileset ag5eg19 is over quota', your poor user will now be confused over what director(y/ies) need to be cleaned up.? If your tool says 'Fileset foo_bar_baz is over quota' and the user knows that's mounted at /gpfs/foo/bar/baz then it's actionable. And if the user knows what the mapping is, your script can know it too.... _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From olaf.weiser at de.ibm.com Mon Jan 14 06:31:28 2019 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Mon, 14 Jan 2019 07:31:28 +0100 Subject: [gpfsug-discuss] A cautionary tale of upgrades In-Reply-To: <7085B14A-92B4-47DE-9411-FB544FD4A610@bham.ac.uk> References: <7085B14A-92B4-47DE-9411-FB544FD4A610@bham.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: From sandeep.patil at in.ibm.com Mon Jan 14 12:54:29 2019 From: sandeep.patil at in.ibm.com (Sandeep Ramesh) Date: Mon, 14 Jan 2019 12:54:29 +0000 Subject: [gpfsug-discuss] Latest Technical Blogs on IBM Spectrum Scale (Q4 2018) In-Reply-To: References: Message-ID: Dear User Group Members, In continuation, here are list of development blogs in the this quarter (Q4 2018). We now have over 100+ developer blogs on Spectrum Scale/ESS. As discussed in User Groups, passing it along to the emailing list. Redpaper: IBM Spectrum Scale and IBM StoredIQ: Identifying and securing your business data to support regulatory requirements http://www.redbooks.ibm.com/abstracts/redp5525.html?Open IBM Spectrum Scale Memory Usage https://www.slideshare.net/tomerperry/ibm-spectrum-scale-memory-usage?qid=50a1dfda-3102-484f-b9d0-14b69fc4800b&v=&b=&from_search=2 Spectrum Scale and Containers https://developer.ibm.com/storage/2018/12/20/spectrum-scale-and-containers/ IBM Elastic Storage Server Performance Graphical Visualization with Grafana https://developer.ibm.com/storage/2018/12/18/ibm-elastic-storage-server-performance-graphical-visualization-with-grafana/ Hadoop Performance for disaggregated compute and storage configurations based on IBM Spectrum Scale Storage https://developer.ibm.com/storage/2018/12/13/hadoop-performance-for-disaggregated-compute-and-storage-configurations-based-on-ibm-spectrum-scale-storage/ EMS HA in ESS LE (Little Endian) environment https://developer.ibm.com/storage/2018/12/07/ems-ha-in-ess-le-little-endian-environment/ What?s new in ESS 5.3.2 https://developer.ibm.com/storage/2018/12/04/whats-new-in-ess-5-3-2/ Administer your Spectrum Scale cluster easily https://developer.ibm.com/storage/2018/11/13/administer-your-spectrum-scale-cluster-easily/ Disaster Recovery using Spectrum Scale?s Active File Management https://developer.ibm.com/storage/2018/11/13/disaster-recovery-using-spectrum-scales-active-file-management/ Recovery Group Failover Procedure of IBM Elastic Storage Server (ESS) https://developer.ibm.com/storage/2018/10/08/recovery-group-failover-procedure-ibm-elastic-storage-server-ess/ Whats new in IBM Elastic Storage Server (ESS) Version 5.3.1 and 5.3.1.1 https://developer.ibm.com/storage/2018/10/04/whats-new-ibm-elastic-storage-server-ess-version-5-3-1-5-3-1-1/ For more : Search /browse here: https://developer.ibm.com/storage/blog User Group Presentations: https://www.spectrumscale.org/presentations/ Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Blogs%2C%20White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Date: 10/03/2018 08:48 PM Subject: Latest Technical Blogs on IBM Spectrum Scale (Q3 2018) Dear User Group Members, In continuation, here are list of development blogs in the this quarter (Q3 2018). We now have over 100+ developer blogs on Spectrum Scale/ESS. As discussed in User Groups, passing it along to the emailing list. How NFS exports became more dynamic with Spectrum Scale 5.0.2 https://developer.ibm.com/storage/2018/10/02/nfs-exports-became-dynamic-spectrum-scale-5-0-2/ HPC storage on AWS (IBM Spectrum Scale) https://developer.ibm.com/storage/2018/10/02/hpc-storage-aws-ibm-spectrum-scale/ Upgrade with Excluding the node(s) using Install-toolkit https://developer.ibm.com/storage/2018/09/30/upgrade-excluding-nodes-using-install-toolkit/ Offline upgrade using Install-toolkit https://developer.ibm.com/storage/2018/09/30/offline-upgrade-using-install-toolkit/ IBM Spectrum Scale for Linux on IBM Z ? What?s new in IBM Spectrum Scale 5.0.2 ? https://developer.ibm.com/storage/2018/09/21/ibm-spectrum-scale-for-linux-on-ibm-z-whats-new-in-ibm-spectrum-scale-5-0-2/ What?s New in IBM Spectrum Scale 5.0.2 ? https://developer.ibm.com/storage/2018/09/15/whats-new-ibm-spectrum-scale-5-0-2/ Starting IBM Spectrum Scale 5.0.2 release, the installation toolkit supports upgrade rerun if fresh upgrade fails. https://developer.ibm.com/storage/2018/09/15/starting-ibm-spectrum-scale-5-0-2-release-installation-toolkit-supports-upgrade-rerun-fresh-upgrade-fails/ IBM Spectrum Scale installation toolkit ? enhancements over releases ? 5.0.2.0 https://developer.ibm.com/storage/2018/09/15/ibm-spectrum-scale-installation-toolkit-enhancements-releases-5-0-2-0/ Announcing HDP 3.0 support with IBM Spectrum Scale https://developer.ibm.com/storage/2018/08/31/announcing-hdp-3-0-support-ibm-spectrum-scale/ IBM Spectrum Scale Tuning Overview for Hadoop Workload https://developer.ibm.com/storage/2018/08/20/ibm-spectrum-scale-tuning-overview-hadoop-workload/ Making the Most of Multicloud Storage https://developer.ibm.com/storage/2018/08/13/making-multicloud-storage/ Disaster Recovery for Transparent Cloud Tiering using SOBAR https://developer.ibm.com/storage/2018/08/13/disaster-recovery-transparent-cloud-tiering-using-sobar/ Your Optimal Choice of AI Storage for Today and Tomorrow https://developer.ibm.com/storage/2018/08/10/spectrum-scale-ai-workloads/ Analyze IBM Spectrum Scale File Access Audit with ELK Stack https://developer.ibm.com/storage/2018/07/30/analyze-ibm-spectrum-scale-file-access-audit-elk-stack/ Mellanox SX1710 40G switch MLAG configuration for IBM ESS https://developer.ibm.com/storage/2018/07/12/mellanox-sx1710-40g-switcher-mlag-configuration/ Protocol Problem Determination Guide for IBM Spectrum Scale? ? SMB and NFS Access issues https://developer.ibm.com/storage/2018/07/10/protocol-problem-determination-guide-ibm-spectrum-scale-smb-nfs-access-issues/ Access Control in IBM Spectrum Scale Object https://developer.ibm.com/storage/2018/07/06/access-control-ibm-spectrum-scale-object/ IBM Spectrum Scale HDFS Transparency Docker support https://developer.ibm.com/storage/2018/07/06/ibm-spectrum-scale-hdfs-transparency-docker-support/ Protocol Problem Determination Guide for IBM Spectrum Scale? ? Log Collection https://developer.ibm.com/storage/2018/07/04/protocol-problem-determination-guide-ibm-spectrum-scale-log-collection/ Redpapers IBM Spectrum Scale Immutability Introduction, Configuration Guidance, and Use Cases http://www.redbooks.ibm.com/abstracts/redp5507.html?Open Certifications Assessment of the immutability function of IBM Spectrum Scale Version 5.0 in accordance to US SEC17a-4f, EU GDPR Article 21 Section 1, German and Swiss laws and regulations in collaboration with KPMG. Certificate: http://www.kpmg.de/bescheinigungen/RequestReport.aspx?DE968667B47544FF83F6CCDCF37E5FB5 Full assessment report: http://www.kpmg.de/bescheinigungen/RequestReport.aspx?B290411BE1224F5A9B4D24663BCD3C5D For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Date: 07/03/2018 12:13 AM Subject: Re: Latest Technical Blogs on Spectrum Scale (Q2 2018) Dear User Group Members, In continuation , here are list of development blogs in the this quarter (Q2 2018). We now have over 100+ developer blogs. As discussed in User Groups, passing it along: IBM Spectrum Scale 5.0.1 ? Whats new in Unified File and Object https://developer.ibm.com/storage/2018/06/15/6494/ IBM Spectrum Scale ILM Policies https://developer.ibm.com/storage/2018/06/02/ibm-spectrum-scale-ilm-policies/ IBM Spectrum Scale 5.0.1 ? Whats new in Unified File and Object https://developer.ibm.com/storage/2018/06/15/6494/ Management GUI enhancements in IBM Spectrum Scale release 5.0.1 https://developer.ibm.com/storage/2018/05/18/management-gui-enhancements-in-ibm-spectrum-scale-release-5-0-1/ Managing IBM Spectrum Scale services through GUI https://developer.ibm.com/storage/2018/05/18/managing-ibm-spectrum-scale-services-through-gui/ Use AWS CLI with IBM Spectrum Scale? object storage https://developer.ibm.com/storage/2018/05/16/use-awscli-with-ibm-spectrum-scale-object-storage/ Hadoop Storage Tiering with IBM Spectrum Scale https://developer.ibm.com/storage/2018/05/09/hadoop-storage-tiering-ibm-spectrum-scale/ How many Files on my Filesystem? https://developer.ibm.com/storage/2018/05/07/many-files-filesystem/ Recording Spectrum Scale Object Stats for Potential Billing like Purpose using Elasticsearch https://developer.ibm.com/storage/2018/05/04/spectrum-scale-object-stats-for-billing-using-elasticsearch/ New features in IBM Elastic Storage Server (ESS) Version 5.3 https://developer.ibm.com/storage/2018/04/09/new-features-ibm-elastic-storage-server-ess-version-5-3/ Using IBM Spectrum Scale for storage in IBM Cloud Private (Missed to send earlier) https://medium.com/ibm-cloud/ibm-spectrum-scale-with-ibm-cloud-private-8bf801796f19 Redpapers Hortonworks Data Platform with IBM Spectrum Scale: Reference Guide for Building an Integrated Solution http://www.redbooks.ibm.com/redpieces/abstracts/redp5448.html, Enabling Hybrid Cloud Storage for IBM Spectrum Scale Using Transparent Cloud Tiering http://www.redbooks.ibm.com/abstracts/redp5411.html?Open SAP HANA and ESS: A Winning Combination (Update) http://www.redbooks.ibm.com/abstracts/redp5436.html?Open Others IBM Spectrum Scale Software Version Recommendation Preventive Service Planning (Updated) http://www-01.ibm.com/support/docview.wss?uid=ssg1S1009703, IDC Infobrief: A Modular Approach to Genomics Infrastructure at Scale in HCLS https://www.ibm.com/common/ssi/cgi-bin/ssialias?htmlfid=37016937USEN& For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Date: 03/27/2018 05:23 PM Subject: Re: Latest Technical Blogs on Spectrum Scale Dear User Group Members, In continuation , here are list of development blogs in the this quarter (Q1 2018). As discussed in User Groups, passing it along: GDPR Compliance and Unstructured Data Storage https://developer.ibm.com/storage/2018/03/27/gdpr-compliance-unstructure-data-storage/ IBM Spectrum Scale for Linux on IBM Z ? Release 5.0 features and highlights https://developer.ibm.com/storage/2018/03/09/ibm-spectrum-scale-linux-ibm-z-release-5-0-features-highlights/ Management GUI enhancements in IBM Spectrum Scale release 5.0.0 https://developer.ibm.com/storage/2018/01/18/gui-enhancements-in-spectrum-scale-release-5-0-0/ IBM Spectrum Scale 5.0.0 ? What?s new in NFS? https://developer.ibm.com/storage/2018/01/18/ibm-spectrum-scale-5-0-0-whats-new-nfs/ Benefits and implementation of Spectrum Scale sudo wrappers https://developer.ibm.com/storage/2018/01/15/benefits-implementation-spectrum-scale-sudo-wrappers/ IBM Spectrum Scale: Big Data and Analytics Solution Brief https://developer.ibm.com/storage/2018/01/15/ibm-spectrum-scale-big-data-analytics-solution-brief/ Variant Sub-blocks in Spectrum Scale 5.0 https://developer.ibm.com/storage/2018/01/11/spectrum-scale-variant-sub-blocks/ Compression support in Spectrum Scale 5.0.0 https://developer.ibm.com/storage/2018/01/11/compression-support-spectrum-scale-5-0-0/ IBM Spectrum Scale Versus Apache Hadoop HDFS https://developer.ibm.com/storage/2018/01/10/spectrumscale_vs_hdfs/ ESS Fault Tolerance https://developer.ibm.com/storage/2018/01/09/ess-fault-tolerance/ Genomic Workloads ? How To Get it Right From Infrastructure Point Of View. https://developer.ibm.com/storage/2018/01/06/genomic-workloads-get-right-infrastructure-point-view/ IBM Spectrum Scale On AWS Cloud : This video explains how to deploy IBM Spectrum Scale on AWS. This solution helps the users who require highly available access to a shared name space across multiple instances with good performance, without requiring an in-depth knowledge of IBM Spectrum Scale. Detailed Demo : https://www.youtube.com/watch?v=6j5Xj_d0bh4 Brief Demo : https://www.youtube.com/watch?v=-aMQKPW_RfY. For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Cc: Doris Conti/Poughkeepsie/IBM at IBMUS Date: 01/10/2018 12:13 PM Subject: Re: Latest Technical Blogs on Spectrum Scale Dear User Group Members, Here are list of development blogs in the last quarter. Passing it to this email group as Doris had got a feedback in the UG meetings to notify the members with the latest updates periodically. Genomic Workloads ? How To Get it Right From Infrastructure Point Of View. https://developer.ibm.com/storage/2018/01/06/genomic-workloads-get-right-infrastructure-point-view/ IBM Spectrum Scale Versus Apache Hadoop HDFS https://developer.ibm.com/storage/2018/01/10/spectrumscale_vs_hdfs/ ESS Fault Tolerance https://developer.ibm.com/storage/2018/01/09/ess-fault-tolerance/ IBM Spectrum Scale MMFSCK ? Savvy Enhancements https://developer.ibm.com/storage/2018/01/05/ibm-spectrum-scale-mmfsck-savvy-enhancements/ ESS Disk Management https://developer.ibm.com/storage/2018/01/02/ess-disk-management/ IBM Spectrum Scale Object Protocol On Ubuntu https://developer.ibm.com/storage/2018/01/01/ibm-spectrum-scale-object-protocol-ubuntu/ IBM Spectrum Scale 5.0 ? Whats new in Unified File and Object https://developer.ibm.com/storage/2017/12/20/ibm-spectrum-scale-5-0-whats-new-object/ A Complete Guide to ? Protocol Problem Determination Guide for IBM Spectrum Scale? ? Part 1 https://developer.ibm.com/storage/2017/12/19/complete-guide-protocol-problem-determination-guide-ibm-spectrum-scale-1/ IBM Spectrum Scale installation toolkit ? enhancements over releases https://developer.ibm.com/storage/2017/12/15/ibm-spectrum-scale-installation-toolkit-enhancements-releases/ Network requirements in an Elastic Storage Server Setup https://developer.ibm.com/storage/2017/12/13/network-requirements-in-an-elastic-storage-server-setup/ Co-resident migration with Transparent cloud tierin https://developer.ibm.com/storage/2017/12/05/co-resident-migration-transparent-cloud-tierin/ IBM Spectrum Scale on Hortonworks HDP Hadoop clusters : A Complete Big Data Solution https://developer.ibm.com/storage/2017/12/05/ibm-spectrum-scale-hortonworks-hdp-hadoop-clusters-complete-big-data-solution/ Big data analytics with Spectrum Scale using remote cluster mount & multi-filesystem support https://developer.ibm.com/storage/2017/11/28/big-data-analytics-spectrum-scale-using-remote-cluster-mount-multi-filesystem-support/ IBM Spectrum Scale HDFS Transparency Short Circuit Write Support https://developer.ibm.com/storage/2017/11/28/ibm-spectrum-scale-hdfs-transparency-short-circuit-write-support/ IBM Spectrum Scale HDFS Transparency Federation Support https://developer.ibm.com/storage/2017/11/27/ibm-spectrum-scale-hdfs-transparency-federation-support/ How to configure and performance tuning different system workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-different-system-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ How to configure and performance tuning Spark workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-spark-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ How to configure and performance tuning database workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-database-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ How to configure and performance tuning Hadoop workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/24/configure-performance-tuning-hadoop-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ IBM Spectrum Scale Sharing Nothing Cluster Performance Tuning https://developer.ibm.com/storage/2017/11/24/ibm-spectrum-scale-sharing-nothing-cluster-performance-tuning/ How to Configure IBM Spectrum Scale? with NIS based Authentication. https://developer.ibm.com/storage/2017/11/21/configure-ibm-spectrum-scale-nis-based-authentication/ For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Cc: Doris Conti/Poughkeepsie/IBM at IBMUS Date: 11/16/2017 08:15 PM Subject: Latest Technical Blogs on Spectrum Scale Dear User Group members, Here are the Development Blogs in last 3 months on Spectrum Scale Technical Topics. Spectrum Scale Monitoring ? Know More ? https://developer.ibm.com/storage/2017/11/16/spectrum-scale-monitoring-know/ IBM Spectrum Scale 5.0 Release ? What?s coming ! https://developer.ibm.com/storage/2017/11/14/ibm-spectrum-scale-5-0-release-whats-coming/ Four Essentials things to know for managing data ACLs on IBM Spectrum Scale? from Windows https://developer.ibm.com/storage/2017/11/13/four-essentials-things-know-managing-data-acls-ibm-spectrum-scale-windows/ GSSUTILS: A new way of running SSR, Deploying or Upgrading ESS Server https://developer.ibm.com/storage/2017/11/13/gssutils/ IBM Spectrum Scale Object Authentication https://developer.ibm.com/storage/2017/11/02/spectrum-scale-object-authentication/ Video Surveillance ? Choosing the right storage https://developer.ibm.com/storage/2017/11/02/video-surveillance-choosing-right-storage/ IBM Spectrum scale object deep dive training with problem determination https://www.slideshare.net/SmitaRaut/ibm-spectrum-scale-object-deep-dive-training Spectrum Scale as preferred software defined storage for Ubuntu OpenStack https://developer.ibm.com/storage/2017/09/29/spectrum-scale-preferred-software-defined-storage-ubuntu-openstack/ IBM Elastic Storage Server 2U24 Storage ? an All-Flash offering, a performance workhorse https://developer.ibm.com/storage/2017/10/06/ess-5-2-flash-storage/ A Complete Guide to Configure LDAP-based authentication with IBM Spectrum Scale? for File Access https://developer.ibm.com/storage/2017/09/21/complete-guide-configure-ldap-based-authentication-ibm-spectrum-scale-file-access/ Deploying IBM Spectrum Scale on AWS Quick Start https://developer.ibm.com/storage/2017/09/18/deploy-ibm-spectrum-scale-on-aws-quick-start/ Monitoring Spectrum Scale Object metrics https://developer.ibm.com/storage/2017/09/14/monitoring-spectrum-scale-object-metrics/ Tier your data with ease to Spectrum Scale Private Cloud(s) using Moonwalk Universal https://developer.ibm.com/storage/2017/09/14/tier-data-ease-spectrum-scale-private-clouds-using-moonwalk-universal/ Why do I see owner as ?Nobody? for my export mounted using NFSV4 Protocol on IBM Spectrum Scale?? https://developer.ibm.com/storage/2017/09/08/see-owner-nobody-export-mounted-using-nfsv4-protocol-ibm-spectrum-scale/ IBM Spectrum Scale? Authentication using Active Directory and LDAP https://developer.ibm.com/storage/2017/09/01/ibm-spectrum-scale-authentication-using-active-directory-ldap/ IBM Spectrum Scale? Authentication using Active Directory and RFC2307 https://developer.ibm.com/storage/2017/09/01/ibm-spectrum-scale-authentication-using-active-directory-rfc2307/ High Availability Implementation with IBM Spectrum Virtualize and IBM Spectrum Scale https://developer.ibm.com/storage/2017/08/30/high-availability-implementation-ibm-spectrum-virtualize-ibm-spectrum-scale/ 10 Frequently asked Questions on configuring Authentication using AD + AUTO ID mapping on IBM Spectrum Scale?. https://developer.ibm.com/storage/2017/08/04/10-frequently-asked-questions-configuring-authentication-using-ad-auto-id-mapping-ibm-spectrum-scale/ IBM Spectrum Scale? Authentication using Active Directory https://developer.ibm.com/storage/2017/07/30/ibm-spectrum-scale-auth-using-active-directory/ Five cool things that you didn?t know Transparent Cloud Tiering on Spectrum Scale can do https://developer.ibm.com/storage/2017/07/29/five-cool-things-didnt-know-transparent-cloud-tiering-spectrum-scale-can/ IBM Spectrum Scale GUI videos https://developer.ibm.com/storage/2017/07/25/ibm-spectrum-scale-gui-videos/ IBM Spectrum Scale? Authentication ? Planning for NFS Access https://developer.ibm.com/storage/2017/07/24/ibm-spectrum-scale-planning-nfs-access/ For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media -------------- next part -------------- An HTML attachment was scrubbed... URL: From cabrillo at ifca.unican.es Tue Jan 15 10:49:58 2019 From: cabrillo at ifca.unican.es (Iban Cabrillo) Date: Tue, 15 Jan 2019 11:49:58 +0100 (CET) Subject: [gpfsug-discuss] Monitoring non gpfs cluster nodes using pmsensors Message-ID: <1730394866.8701339.1547549398355.JavaMail.zimbra@ifca.unican.es> Dear, The gpfsgui dashboard show us most part of relevant information for cluster management. Avoiding to install other plot utilities (like graphana for example), we want to explore the possibility to use this packages to harvest and plot this information, in order to centralize the graph management in one only place. We see this information arrives to the gpfsgui node (from non gpfs cluster nodes), but we can't show the plots. Is there any way to use the pmsensor and pmcollector packages to monitorice / plot non gpfs cluster nodes using the gpfsgui dashboard ? Regards, I -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon Jan 14 15:02:07 2019 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 14 Jan 2019 15:02:07 +0000 Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? In-Reply-To: References: Message-ID: <5DCBA252-33B7-4AC9-B2E3-10DA6881B1AA@vanderbilt.edu> Hi Scott and Valdis (and everyone else), Thanks for your responses. Yes, we _could_ easily build a local naming scheme ? the name of the fileset matches the name of a folder in one of a couple of parent directories. However, an earlier response to my post asked if we?d be willing to share our script with the community and we would ? _if_ we can make it generic enough to be useful. Local naming schemes hardcoded in the script make it much less generically useful. Plus, it just seems to me that there ought to be a way to do this ? to get a list of fileset names from mmlsquota and then programmatically determine their junction path without having root privileges. GPFS has got to be storing that information somewhere, and I?m frankly quite surprised that no IBMer has responded with an answer to that. But I also know that when IBM is silent, there?s typically a reason. And yes, we could regularly create a static file ? in fact, that?s what we do now once per day (in the early morning hours). While this is not a huge deal - we only create / delete filesets a handful of times per month - on the day we do the script won?t function properly unless we manually update the file. I?m wanting to eliminate that, if possible ? which as I stated in the preceding paragraph, I have a hard time believing is not possible. I did look at the list of callbacks again (good thought!) and there?s not one specifically related to the creation / deletion of a fileset. There was only one that I saw that I think could even possibly be of use ? ccrFileChange. Can anyone on the list confirm or deny that the creation / deletion of a fileset would cause that callback to be triggered?? If it is triggered, then we could use that to update the static filesets within a minute or two of the change being made, which would definitely be acceptable. I realize that many things likely trigger a ccrFileChange, so I?m thinking of having a callback script that checks the current list of filesets against the static file and updates that appropriately. Thanks again for the responses? Kevin > On Jan 13, 2019, at 10:09 PM, Scott Goldman wrote: > > Kevin, > Something I've done in the past is to create a service that once an hour/day/week that would build a static file that consists of the needed output. > > As long as you can take the update delay (or perhaps trigger the update with a callback), this should work and could actually be lighter on the system. > > Sent from my BlackBerry - the most secure mobile device > > Original Message > From: valdis.kletnieks at vt.edu > Sent: January 12, 2019 4:07 PM > To: gpfsug-discuss at spectrumscale.org > Reply-to: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? > > On Sat, 12 Jan 2019 03:07:29 +0000, "Buterbaugh, Kevin L" said: >> But from there I need to then be able to find out where that fileset is >> mounted in the directory tree so that I can see who the owner and group of that >> directory are. > > You're not able to leverage a local naming scheme? There's no connection between > the name of the fileset and where it is in the tree? I would hope there is, because > otherwise when your tool says 'Fileset ag5eg19 is over quota', your poor user will > now be confused over what director(y/ies) need to be cleaned up. If your tool > says 'Fileset foo_bar_baz is over quota' and the user knows that's mounted at > /gpfs/foo/bar/baz then it's actionable. > > And if the user knows what the mapping is, your script can know it too.... > From Kevin.Buterbaugh at Vanderbilt.Edu Mon Jan 14 15:02:07 2019 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 14 Jan 2019 15:02:07 +0000 Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? In-Reply-To: References: Message-ID: <5DCBA252-33B7-4AC9-B2E3-10DA6881B1AA@vanderbilt.edu> Hi Scott and Valdis (and everyone else), Thanks for your responses. Yes, we _could_ easily build a local naming scheme ? the name of the fileset matches the name of a folder in one of a couple of parent directories. However, an earlier response to my post asked if we?d be willing to share our script with the community and we would ? _if_ we can make it generic enough to be useful. Local naming schemes hardcoded in the script make it much less generically useful. Plus, it just seems to me that there ought to be a way to do this ? to get a list of fileset names from mmlsquota and then programmatically determine their junction path without having root privileges. GPFS has got to be storing that information somewhere, and I?m frankly quite surprised that no IBMer has responded with an answer to that. But I also know that when IBM is silent, there?s typically a reason. And yes, we could regularly create a static file ? in fact, that?s what we do now once per day (in the early morning hours). While this is not a huge deal - we only create / delete filesets a handful of times per month - on the day we do the script won?t function properly unless we manually update the file. I?m wanting to eliminate that, if possible ? which as I stated in the preceding paragraph, I have a hard time believing is not possible. I did look at the list of callbacks again (good thought!) and there?s not one specifically related to the creation / deletion of a fileset. There was only one that I saw that I think could even possibly be of use ? ccrFileChange. Can anyone on the list confirm or deny that the creation / deletion of a fileset would cause that callback to be triggered?? If it is triggered, then we could use that to update the static filesets within a minute or two of the change being made, which would definitely be acceptable. I realize that many things likely trigger a ccrFileChange, so I?m thinking of having a callback script that checks the current list of filesets against the static file and updates that appropriately. Thanks again for the responses? Kevin > On Jan 13, 2019, at 10:09 PM, Scott Goldman wrote: > > Kevin, > Something I've done in the past is to create a service that once an hour/day/week that would build a static file that consists of the needed output. > > As long as you can take the update delay (or perhaps trigger the update with a callback), this should work and could actually be lighter on the system. > > Sent from my BlackBerry - the most secure mobile device > > Original Message > From: valdis.kletnieks at vt.edu > Sent: January 12, 2019 4:07 PM > To: gpfsug-discuss at spectrumscale.org > Reply-to: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? > > On Sat, 12 Jan 2019 03:07:29 +0000, "Buterbaugh, Kevin L" said: >> But from there I need to then be able to find out where that fileset is >> mounted in the directory tree so that I can see who the owner and group of that >> directory are. > > You're not able to leverage a local naming scheme? There's no connection between > the name of the fileset and where it is in the tree? I would hope there is, because > otherwise when your tool says 'Fileset ag5eg19 is over quota', your poor user will > now be confused over what director(y/ies) need to be cleaned up. If your tool > says 'Fileset foo_bar_baz is over quota' and the user knows that's mounted at > /gpfs/foo/bar/baz then it's actionable. > > And if the user knows what the mapping is, your script can know it too.... > From makaplan at us.ibm.com Tue Jan 15 14:46:18 2019 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 15 Jan 2019 11:46:18 -0300 Subject: [gpfsug-discuss] Get list of filesets_without_runningmmlsfileset? In-Reply-To: <5DCBA252-33B7-4AC9-B2E3-10DA6881B1AA@vanderbilt.edu> References: <5DCBA252-33B7-4AC9-B2E3-10DA6881B1AA@vanderbilt.edu> Message-ID: Personally, I agree that there ought to be a way in the product. In the meawhile, you no doubt already have some ways to tell your users where to find their filesets as pathnames. Otherwise, how are they accessing their files? And to keep things somewhat sane, I'd bet filesets are all linked to one or small number of well known paths in the filesystem. Like /AGpfsFilesystem/filesets/... Plus you could add symlinks and/or as has been suggested post info extracted from mmlsfileset and/or mmlsquota. So as a practical matter, is this an urgent problem...? Why? How? -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Tue Jan 15 15:11:41 2019 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Tue, 15 Jan 2019 15:11:41 +0000 Subject: [gpfsug-discuss] Get list of filesets_without_runningmmlsfileset? In-Reply-To: References: <5DCBA252-33B7-4AC9-B2E3-10DA6881B1AA@vanderbilt.edu> Message-ID: <0D5558D9-9003-4B95-9A37-42321E03114D@vanderbilt.edu> Hi Marc (All), Yes, I can easily determine where filesets are linked here ? it is, as you said, in just one or two paths. The script as it stands now has been doing that for several years and only needs a couple of relatively minor tweaks to be even more useful to _us_ by whittling down a couple of edge cases relating to fileset creation / deletion. However ? there was a request to share the script with the broader community ? something I?m willing to do if I can get it in a state where it would be useful to others with little or no modification. Anybody who?s been on this list for any length of time knows how much help I?ve received from the community over the years. I truly appreciate that and would like to give back, even in a minor way, if possible. But in order to do that the script can?t be full of local assumptions ? that?s it in a nutshell ? that?s why I want to programmatically determine the junction path at run time as a non-root user. I?ll also mention here that early on in this thread Simon Thompson suggested looking into the REST API. Sure enough, you can get the information that way ? but, AFAICT, that would require the script to contain a username / password combination that would allow anyone with access to the script to then use that authentication information to access other information within GPFS that we probably don?t want them to have access to. If I?m mistaken about that, then please feel free to enlighten me. Thanks again? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Jan 15, 2019, at 8:46 AM, Marc A Kaplan > wrote: Personally, I agree that there ought to be a way in the product. In the meawhile, you no doubt already have some ways to tell your users where to find their filesets as pathnames. Otherwise, how are they accessing their files? And to keep things somewhat sane, I'd bet filesets are all linked to one or small number of well known paths in the filesystem. Like /AGpfsFilesystem/filesets/... Plus you could add symlinks and/or as has been suggested post info extracted from mmlsfileset and/or mmlsquota. So as a practical matter, is this an urgent problem...? Why? How? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cbd2c28fdb60041f3434e08d67af83b11%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636831603904557717&sdata=A74TTq%2FQvyhEMHaolklbiMAEnaGVuHNiyhVYfn4wRek%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rohwedder at de.ibm.com Tue Jan 15 15:36:39 2019 From: rohwedder at de.ibm.com (Markus Rohwedder) Date: Tue, 15 Jan 2019 16:36:39 +0100 Subject: [gpfsug-discuss] Monitoring non gpfs cluster nodes using pmsensors In-Reply-To: <1730394866.8701339.1547549398355.JavaMail.zimbra@ifca.unican.es> References: <1730394866.8701339.1547549398355.JavaMail.zimbra@ifca.unican.es> Message-ID: Hello Iban, the pmsensor and pmcollector packages together with the GUI dashboard and statistics pages are not designed to be a general monitoring solution. For example. in many places we are filtering for GPFS nodes that are known to be cluster members and we try to match host names to GPFS node names. This causes the lack of nodes in GUI charts you are experiencing. In addition. the CLI based setup and management of the sensors assume that sensor nodes are cluster nodes. We are not intending to open up the internal management and views for data outside the cluster in the futute.- The requirements to provide plotting, filtering, aggregation and calculation in a general plotting environment can be very diverse and we may not be able to handle this. So while we are flattered by the request to use our charting capabilities as a general solution, we propose to use tools like grafana as more general solution. Please note that the GUI charts and dashboards have URLs that allow them to be hyperlinked, so you could also combine other web based charting tools together with the GUI based charts. Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development Phone: +49 7034 6430190 IBM Deutschland Research & Development E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany From: Iban Cabrillo To: gpfsug-discuss Date: 15.01.2019 12:05 Subject: [gpfsug-discuss] Monitoring non gpfs cluster nodes using pmsensors Sent by: gpfsug-discuss-bounces at spectrumscale.org Dear, The gpfsgui dashboard show us most part of relevant information for cluster management. Avoiding to install other plot utilities (like graphana for example), we want to explore the possibility to use this packages to harvest and plot this information, in order to centralize the graph management in one only place. We see this information arrives to the gpfsgui node (from non gpfs cluster nodes), but we can't show the plots. Is there any way to use the pmsensor and pmcollector packages to monitorice / plot non gpfs cluster nodes using the gpfsgui dashboard ? Regards, I _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1D690169.gif Type: image/gif Size: 4659 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Tue Jan 15 15:57:39 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 15 Jan 2019 15:57:39 +0000 Subject: [gpfsug-discuss] Monitoring non gpfs cluster nodes using pmsensors Message-ID: Understand that you don?t want to install Grafana on its own, but there is a GPFS Grafana bridge I believe that would allow you to include the GPFS collected data in a Grafana dashboard. So if not wanting to setup sensors for that data is the reason you don?t want Grafana, then using the bridge might pull the data you want? Simon From: on behalf of "cabrillo at ifca.unican.es" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Tuesday, 15 January 2019 at 11:05 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Monitoring non gpfs cluster nodes using pmsensors Dear, The gpfsgui dashboard show us most part of relevant information for cluster management. Avoiding to install other plot utilities (like graphana for example), we want to explore the possibility to use this packages to harvest and plot this information, in order to centralize the graph management in one only place. We see this information arrives to the gpfsgui node (from non gpfs cluster nodes), but we can't show the plots. Is there any way to use the pmsensor and pmcollector packages to monitorice / plot non gpfs cluster nodes using the gpfsgui dashboard ? Regards, I -------------- next part -------------- An HTML attachment was scrubbed... URL: From A.Wolf-Reber at de.ibm.com Wed Jan 16 08:16:58 2019 From: A.Wolf-Reber at de.ibm.com (Alexander Wolf) Date: Wed, 16 Jan 2019 08:16:58 +0000 Subject: [gpfsug-discuss] Get list offilesets_without_runningmmlsfileset? In-Reply-To: References: , <5DCBA252-33B7-4AC9-B2E3-10DA6881B1AA@vanderbilt.edu> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15475476039319.png Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.154754760393110.png Type: image/png Size: 6645 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.154754760393111.png Type: image/png Size: 1134 bytes Desc: not available URL: From makaplan at us.ibm.com Wed Jan 16 12:57:18 2019 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Wed, 16 Jan 2019 09:57:18 -0300 Subject: [gpfsug-discuss] Get fileset and other info via Rest API and/or GUI In-Reply-To: References: , <5DCBA252-33B7-4AC9-B2E3-10DA6881B1AA@vanderbilt.edu> Message-ID: Good to know the "Rest" does it for us. Since I started working on GPFS internals and CLI utitlities around Release 3.x, I confess I never had need of the GUI or the Rest API server. In fact I do most of my work remotely via Putty/Xterm/Emacs and only once-in-a-while even have an XWindows or VNC server/view of a GPFS node! So consider any of my remarks in that context. So I certainly defer to others when it comes to Spectrum Scale GUIs, "Protocol" servers and such. If I'm missing anything great, perhaps some kind soul will send me a note offline from this public forum. --Marc.K of GPFS -------------- next part -------------- An HTML attachment was scrubbed... URL: From spectrumscale at kiranghag.com Wed Jan 16 16:18:16 2019 From: spectrumscale at kiranghag.com (KG) Date: Wed, 16 Jan 2019 21:48:16 +0530 Subject: [gpfsug-discuss] Filesystem automount issues Message-ID: Hi IHAC running Scale 5.x on RHEL 7.5 One out of two filesystems (/home) does not get mounted automatically at boot. (/home is scale filesystem) The scale log does mention that the filesystem is mounted but mount output says otherwise. There are no entries for /home in fstab since we let scale mount it. Automount on scale and filesystem both have been set to yes. Any pointers to troubleshoot would be appreciated. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Wed Jan 16 16:33:25 2019 From: stockf at us.ibm.com (Frederick Stock) Date: Wed, 16 Jan 2019 11:33:25 -0500 Subject: [gpfsug-discuss] Filesystem automount issues In-Reply-To: References: Message-ID: What does the output of "mmlsmount all -L" show? Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: KG To: gpfsug main discussion list Date: 01/16/2019 11:19 AM Subject: [gpfsug-discuss] Filesystem automount issues Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi IHAC running Scale 5.x on RHEL 7.5 One out of two filesystems (/home) does not get mounted automatically at boot. (/home is scale filesystem) The scale log does mention that the filesystem is mounted but mount output says otherwise. There are no entries for /home in fstab since we let scale mount it. Automount on scale and filesystem both have been set to yes. Any pointers to troubleshoot would be appreciated. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From spectrumscale at kiranghag.com Wed Jan 16 18:14:39 2019 From: spectrumscale at kiranghag.com (KG) Date: Wed, 16 Jan 2019 23:44:39 +0530 Subject: [gpfsug-discuss] Filesystem automount issues In-Reply-To: References: Message-ID: It shows that the filesystem is not mounted On Wed, Jan 16, 2019, 22:03 Frederick Stock What does the output of "mmlsmount all -L" show? > > Fred > __________________________________________________ > Fred Stock | IBM Pittsburgh Lab | 720-430-8821 > stockf at us.ibm.com > > > > From: KG > To: gpfsug main discussion list > Date: 01/16/2019 11:19 AM > Subject: [gpfsug-discuss] Filesystem automount issues > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi > > IHAC running Scale 5.x on RHEL 7.5 > > One out of two filesystems (/home) does not get mounted automatically at > boot. (/home is scale filesystem) > > The scale log does mention that the filesystem is mounted but mount output > says otherwise. > > There are no entries for /home in fstab since we let scale mount it. > Automount on scale and filesystem both have been set to yes. > > Any pointers to troubleshoot would be appreciated. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Wed Jan 16 18:38:07 2019 From: stockf at us.ibm.com (Frederick Stock) Date: Wed, 16 Jan 2019 13:38:07 -0500 Subject: [gpfsug-discuss] Filesystem automount issues In-Reply-To: References: Message-ID: Would it be possible for you to include the output of "mmlsmount all -L" and "df -k" in your response? Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: KG To: gpfsug main discussion list Date: 01/16/2019 01:15 PM Subject: Re: [gpfsug-discuss] Filesystem automount issues Sent by: gpfsug-discuss-bounces at spectrumscale.org It shows that the filesystem is not mounted On Wed, Jan 16, 2019, 22:03 Frederick Stock To: gpfsug main discussion list Date: 01/16/2019 11:19 AM Subject: [gpfsug-discuss] Filesystem automount issues Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi IHAC running Scale 5.x on RHEL 7.5 One out of two filesystems (/home) does not get mounted automatically at boot. (/home is scale filesystem) The scale log does mention that the filesystem is mounted but mount output says otherwise. There are no entries for /home in fstab since we let scale mount it. Automount on scale and filesystem both have been set to yes. Any pointers to troubleshoot would be appreciated. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Wed Jan 16 20:01:53 2019 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Wed, 16 Jan 2019 21:01:53 +0100 Subject: [gpfsug-discuss] Filesystem automount issues In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Jan 17 11:35:13 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 17 Jan 2019 11:35:13 +0000 Subject: [gpfsug-discuss] Node expels Message-ID: We?ve recently been seeing quite a few node expels with messages of the form: 2019-01-17_11:19:30.882+0000: [W] The TCP connection to IP address 10.20.0.58 proto-pg-pf01.bear.cluster (socket 153) state is unexpected: state=1 ca_state=4 snd_cwnd=1 snd_ssthresh=5 unacked=5 probes=0 backoff=7 retransmits=7 rto=26496000 rcv_ssthresh=102828 rtt=6729 rttvar=12066 sacked=0 retrans=1 reordering=3 lost=5 2019-01-17_11:19:30.882+0000: [I] tscCheckTcpConn: Sending debug data collection request to node 10.20.0.58 proto-pg-pf01.bear.cluster 2019-01-17_11:19:30.882+0000: Sending request to collect TCP debug data to proto-pg-pf01.bear.cluster localNode 2019-01-17_11:19:30.882+0000: [I] Calling user exit script gpfsSendRequestToNodes: event sendRequestToNodes, Async command /usr/lpp/mmfs/bin/mmcommon. 2019-01-17_11:24:52.611+0000: [E] Timed out in 300 seconds waiting for a commMsgCheckMessages reply from node 10.20.0.58 proto-pg-pf01.bear.cluster. Sending expel message. On the client node, we see messages of the form: 2019-01-17_11:19:31.101+0000: [N] sdrServ: Received Tcp data collection request from 10.10.0.33 2019-01-17_11:19:31.102+0000: [N] GPFS will attempt to collect Tcp debug data on this node. 2019-01-17_11:24:52.838+0000: [N] sdrServ: Received expel data collection request from 10.10.0.33 2019-01-17_11:24:52.838+0000: [N] GPFS will attempt to collect debug data on this node. 2019-01-17_11:25:02.741+0000: [N] This node will be expelled from cluster rds.gpfs.servers due to expel msg from 10.10.12.41 (b ber-les-nsd01-data.bb2.cluster in rds.gpfs.server 2019-01-17_11:25:03.160+0000: [N] sdrServ: Received expel data collection request from 10.20.0.56 They always appear to be to a specific type of hardware with the same Ethernet controller, though the nodes are split across three data centres and we aren?t seeing link congestion on the links between them. On the node I listed above, it?s not actually doing anything either as the software on it is still being installed (i.e. it?s not doing GPFS or any other IO other than a couple of home directories). Any suggestions on what ?(socket 153) state is unexpected? means? Thanks Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From TOMP at il.ibm.com Thu Jan 17 11:46:19 2019 From: TOMP at il.ibm.com (Tomer Perry) Date: Thu, 17 Jan 2019 13:46:19 +0200 Subject: [gpfsug-discuss] Node expels In-Reply-To: References: Message-ID: Simon, Take a look at http://files.gpfsug.org/presentations/2018/USA/Scale_Network_Flow-0.8.pdf slide 13. Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 17/01/2019 13:35 Subject: [gpfsug-discuss] Node expels Sent by: gpfsug-discuss-bounces at spectrumscale.org We?ve recently been seeing quite a few node expels with messages of the form: 2019-01-17_11:19:30.882+0000: [W] The TCP connection to IP address 10.20.0.58 proto-pg-pf01.bear.cluster (socket 153) state is unexpected: state=1 ca_state=4 snd_cwnd=1 snd_ssthresh=5 unacked=5 probes=0 backoff=7 retransmits=7 rto=26496000 rcv_ssthresh=102828 rtt=6729 rttvar=12066 sacked=0 retrans=1 reordering=3 lost=5 2019-01-17_11:19:30.882+0000: [I] tscCheckTcpConn: Sending debug data collection request to node 10.20.0.58 proto-pg-pf01.bear.cluster 2019-01-17_11:19:30.882+0000: Sending request to collect TCP debug data to proto-pg-pf01.bear.cluster localNode 2019-01-17_11:19:30.882+0000: [I] Calling user exit script gpfsSendRequestToNodes: event sendRequestToNodes, Async command /usr/lpp/mmfs/bin/mmcommon. 2019-01-17_11:24:52.611+0000: [E] Timed out in 300 seconds waiting for a commMsgCheckMessages reply from node 10.20.0.58 proto-pg-pf01.bear.cluster. Sending expel message. On the client node, we see messages of the form: 2019-01-17_11:19:31.101+0000: [N] sdrServ: Received Tcp data collection request from 10.10.0.33 2019-01-17_11:19:31.102+0000: [N] GPFS will attempt to collect Tcp debug data on this node. 2019-01-17_11:24:52.838+0000: [N] sdrServ: Received expel data collection request from 10.10.0.33 2019-01-17_11:24:52.838+0000: [N] GPFS will attempt to collect debug data on this node. 2019-01-17_11:25:02.741+0000: [N] This node will be expelled from cluster rds.gpfs.servers due to expel msg from 10.10.12.41 (b ber-les-nsd01-data.bb2.cluster in rds.gpfs.server 2019-01-17_11:25:03.160+0000: [N] sdrServ: Received expel data collection request from 10.20.0.56 They always appear to be to a specific type of hardware with the same Ethernet controller, though the nodes are split across three data centres and we aren?t seeing link congestion on the links between them. On the node I listed above, it?s not actually doing anything either as the software on it is still being installed (i.e. it?s not doing GPFS or any other IO other than a couple of home directories). Any suggestions on what ?(socket 153) state is unexpected? means? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From TOMP at il.ibm.com Thu Jan 17 13:28:15 2019 From: TOMP at il.ibm.com (Tomer Perry) Date: Thu, 17 Jan 2019 15:28:15 +0200 Subject: [gpfsug-discuss] Node expels In-Reply-To: References: Message-ID: Hi, I was asked to elaborate a bit ( thus also adding John and Yong Ze Chen). As written on the slide: One of the best ways to determine if a network layer problem is root cause for an expel is to look at the low-level socket details dumped in the ?extra? log data (mmfs dump all) saved as part of automatic data collection on Linux GPFS nodes. So, the idea is that in expel situation, we dump the socket state from the OS ( you can see the same using 'ss -i' for example). In your example, it shows that the ca_state is 4, there are retransmits, high rto and all the point to a network problem. You can find more details here: http://www.yonch.com/tech/linux-tcp-congestion-control-internals Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: "Tomer Perry" To: gpfsug main discussion list Date: 17/01/2019 13:46 Subject: Re: [gpfsug-discuss] Node expels Sent by: gpfsug-discuss-bounces at spectrumscale.org Simon, Take a look at http://files.gpfsug.org/presentations/2018/USA/Scale_Network_Flow-0.8.pdf slide 13. Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 17/01/2019 13:35 Subject: [gpfsug-discuss] Node expels Sent by: gpfsug-discuss-bounces at spectrumscale.org We?ve recently been seeing quite a few node expels with messages of the form: 2019-01-17_11:19:30.882+0000: [W] The TCP connection to IP address 10.20.0.58 proto-pg-pf01.bear.cluster (socket 153) state is unexpected: state=1 ca_state=4 snd_cwnd=1 snd_ssthresh=5 unacked=5 probes=0 backoff=7 retransmits=7 rto=26496000 rcv_ssthresh=102828 rtt=6729 rttvar=12066 sacked=0 retrans=1 reordering=3 lost=5 2019-01-17_11:19:30.882+0000: [I] tscCheckTcpConn: Sending debug data collection request to node 10.20.0.58 proto-pg-pf01.bear.cluster 2019-01-17_11:19:30.882+0000: Sending request to collect TCP debug data to proto-pg-pf01.bear.cluster localNode 2019-01-17_11:19:30.882+0000: [I] Calling user exit script gpfsSendRequestToNodes: event sendRequestToNodes, Async command /usr/lpp/mmfs/bin/mmcommon. 2019-01-17_11:24:52.611+0000: [E] Timed out in 300 seconds waiting for a commMsgCheckMessages reply from node 10.20.0.58 proto-pg-pf01.bear.cluster. Sending expel message. On the client node, we see messages of the form: 2019-01-17_11:19:31.101+0000: [N] sdrServ: Received Tcp data collection request from 10.10.0.33 2019-01-17_11:19:31.102+0000: [N] GPFS will attempt to collect Tcp debug data on this node. 2019-01-17_11:24:52.838+0000: [N] sdrServ: Received expel data collection request from 10.10.0.33 2019-01-17_11:24:52.838+0000: [N] GPFS will attempt to collect debug data on this node. 2019-01-17_11:25:02.741+0000: [N] This node will be expelled from cluster rds.gpfs.servers due to expel msg from 10.10.12.41 (b ber-les-nsd01-data.bb2.cluster in rds.gpfs.server 2019-01-17_11:25:03.160+0000: [N] sdrServ: Received expel data collection request from 10.20.0.56 They always appear to be to a specific type of hardware with the same Ethernet controller, though the nodes are split across three data centres and we aren?t seeing link congestion on the links between them. On the node I listed above, it?s not actually doing anything either as the software on it is still being installed (i.e. it?s not doing GPFS or any other IO other than a couple of home directories). Any suggestions on what ?(socket 153) state is unexpected? means? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jlewars at us.ibm.com Thu Jan 17 14:30:45 2019 From: jlewars at us.ibm.com (John Lewars) Date: Thu, 17 Jan 2019 09:30:45 -0500 Subject: [gpfsug-discuss] Node expels In-Reply-To: References: Message-ID: >They always appear to be to a specific type of hardware with the same Ethernet controller, That makes me think you might be seeing packet loss that could require ring buffer tuning (the defaults and limits will differ with different ethernet adapters). The expel section in the slides on this page has been expanded to include a 'debugging expels section' (slides 19-20, which also reference ring buffer tuning): https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/DEBUG%20Expels/comment/7e4f9433-7ca3-430f-b40b-94777c507381 Regards, John Lewars Spectrum Scale Performance, IBM Poughkeepsie From: Tomer Perry/Israel/IBM To: gpfsug main discussion list Cc: John Lewars/Poughkeepsie/IBM at IBMUS, Yong Ze Chen/China/IBM at IBMCN Date: 01/17/2019 08:28 AM Subject: Re: [gpfsug-discuss] Node expels Hi, I was asked to elaborate a bit ( thus also adding John and Yong Ze Chen). As written on the slide: One of the best ways to determine if a network layer problem is root cause for an expel is to look at the low-level socket details dumped in the ?extra? log data (mmfs dump all) saved as part of automatic data collection on Linux GPFS nodes. So, the idea is that in expel situation, we dump the socket state from the OS ( you can see the same using 'ss -i' for example). In your example, it shows that the ca_state is 4, there are retransmits, high rto and all the point to a network problem. You can find more details here: http://www.yonch.com/tech/linux-tcp-congestion-control-internals Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: "Tomer Perry" To: gpfsug main discussion list Date: 17/01/2019 13:46 Subject: Re: [gpfsug-discuss] Node expels Sent by: gpfsug-discuss-bounces at spectrumscale.org Simon, Take a look at http://files.gpfsug.org/presentations/2018/USA/Scale_Network_Flow-0.8.pdf slide 13. Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 17/01/2019 13:35 Subject: [gpfsug-discuss] Node expels Sent by: gpfsug-discuss-bounces at spectrumscale.org We?ve recently been seeing quite a few node expels with messages of the form: 2019-01-17_11:19:30.882+0000: [W] The TCP connection to IP address 10.20.0.58 proto-pg-pf01.bear.cluster (socket 153) state is unexpected: state=1 ca_state=4 snd_cwnd=1 snd_ssthresh=5 unacked=5 probes=0 backoff=7 retransmits=7 rto=26496000 rcv_ssthresh=102828 rtt=6729 rttvar=12066 sacked=0 retrans=1 reordering=3 lost=5 2019-01-17_11:19:30.882+0000: [I] tscCheckTcpConn: Sending debug data collection request to node 10.20.0.58 proto-pg-pf01.bear.cluster 2019-01-17_11:19:30.882+0000: Sending request to collect TCP debug data to proto-pg-pf01.bear.cluster localNode 2019-01-17_11:19:30.882+0000: [I] Calling user exit script gpfsSendRequestToNodes: event sendRequestToNodes, Async command /usr/lpp/mmfs/bin/mmcommon. 2019-01-17_11:24:52.611+0000: [E] Timed out in 300 seconds waiting for a commMsgCheckMessages reply from node 10.20.0.58 proto-pg-pf01.bear.cluster. Sending expel message. On the client node, we see messages of the form: 2019-01-17_11:19:31.101+0000: [N] sdrServ: Received Tcp data collection request from 10.10.0.33 2019-01-17_11:19:31.102+0000: [N] GPFS will attempt to collect Tcp debug data on this node. 2019-01-17_11:24:52.838+0000: [N] sdrServ: Received expel data collection request from 10.10.0.33 2019-01-17_11:24:52.838+0000: [N] GPFS will attempt to collect debug data on this node. 2019-01-17_11:25:02.741+0000: [N] This node will be expelled from cluster rds.gpfs.servers due to expel msg from 10.10.12.41 (b ber-les-nsd01-data.bb2.cluster in rds.gpfs.server 2019-01-17_11:25:03.160+0000: [N] sdrServ: Received expel data collection request from 10.20.0.56 They always appear to be to a specific type of hardware with the same Ethernet controller, though the nodes are split across three data centres and we aren?t seeing link congestion on the links between them. On the node I listed above, it?s not actually doing anything either as the software on it is still being installed (i.e. it?s not doing GPFS or any other IO other than a couple of home directories). Any suggestions on what ?(socket 153) state is unexpected? means? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Jan 17 19:02:06 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 17 Jan 2019 19:02:06 +0000 Subject: [gpfsug-discuss] Node expels In-Reply-To: References: , Message-ID: So we've backed out a bunch of network tuning parameters we had set (based on the GPFS wiki pages), they've been set a while but um ... maybe they are causing issues. Secondly, we've noticed in dump tscomm that we see connection broken to a node, and then the node ID is usually the same node, which is a bit weird to me. We've also just updated firmware on the Intel nics (the x722) which is part of the Skylake board. And specifically its the newer skylake kit we see this problem on. We've a number of issues with the x722 firmware (like it won't even bring a link up when plugged into some of our 10GbE switches, but that's another story). We've also dropped the bonded links from these nodes, just in case its related... Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of jlewars at us.ibm.com [jlewars at us.ibm.com] Sent: 17 January 2019 14:30 To: Tomer Perry; gpfsug main discussion list Cc: Yong Ze Chen Subject: Re: [gpfsug-discuss] Node expels >They always appear to be to a specific type of hardware with the same Ethernet controller, That makes me think you might be seeing packet loss that could require ring buffer tuning (the defaults and limits will differ with different ethernet adapters). The expel section in the slides on this page has been expanded to include a 'debugging expels section' (slides 19-20, which also reference ring buffer tuning): https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/DEBUG%20Expels/comment/7e4f9433-7ca3-430f-b40b-94777c507381 Regards, John Lewars Spectrum Scale Performance, IBM Poughkeepsie From: Tomer Perry/Israel/IBM To: gpfsug main discussion list Cc: John Lewars/Poughkeepsie/IBM at IBMUS, Yong Ze Chen/China/IBM at IBMCN Date: 01/17/2019 08:28 AM Subject: Re: [gpfsug-discuss] Node expels ________________________________ Hi, I was asked to elaborate a bit ( thus also adding John and Yong Ze Chen). As written on the slide: One of the best ways to determine if a network layer problem is root cause for an expel is to look at the low-level socket details dumped in the ?extra? log data (mmfs dump all) saved as part of automatic data collection on Linux GPFS nodes. So, the idea is that in expel situation, we dump the socket state from the OS ( you can see the same using 'ss -i' for example). In your example, it shows that the ca_state is 4, there are retransmits, high rto and all the point to a network problem. You can find more details here: http://www.yonch.com/tech/linux-tcp-congestion-control-internals Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: "Tomer Perry" To: gpfsug main discussion list Date: 17/01/2019 13:46 Subject: Re: [gpfsug-discuss] Node expels Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Simon, Take a look at http://files.gpfsug.org/presentations/2018/USA/Scale_Network_Flow-0.8.pdfslide 13. Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 17/01/2019 13:35 Subject: [gpfsug-discuss] Node expels Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ We?ve recently been seeing quite a few node expels with messages of the form: 2019-01-17_11:19:30.882+0000: [W] The TCP connection to IP address 10.20.0.58 proto-pg-pf01.bear.cluster (socket 153) state is unexpected: state=1 ca_state=4 snd_cwnd=1 snd_ssthresh=5 unacked=5 probes=0 backoff=7 retransmits=7 rto=26496000 rcv_ssthresh=102828 rtt=6729 rttvar=12066 sacked=0 retrans=1 reordering=3 lost=5 2019-01-17_11:19:30.882+0000: [I] tscCheckTcpConn: Sending debug data collection request to node 10.20.0.58 proto-pg-pf01.bear.cluster 2019-01-17_11:19:30.882+0000: Sending request to collect TCP debug data to proto-pg-pf01.bear.cluster localNode 2019-01-17_11:19:30.882+0000: [I] Calling user exit script gpfsSendRequestToNodes: event sendRequestToNodes, Async command /usr/lpp/mmfs/bin/mmcommon. 2019-01-17_11:24:52.611+0000: [E] Timed out in 300 seconds waiting for a commMsgCheckMessages reply from node 10.20.0.58 proto-pg-pf01.bear.cluster. Sending expel message. On the client node, we see messages of the form: 2019-01-17_11:19:31.101+0000: [N] sdrServ: Received Tcp data collection request from 10.10.0.33 2019-01-17_11:19:31.102+0000: [N] GPFS will attempt to collect Tcp debug data on this node. 2019-01-17_11:24:52.838+0000: [N] sdrServ: Received expel data collection request from 10.10.0.33 2019-01-17_11:24:52.838+0000: [N] GPFS will attempt to collect debug data on this node. 2019-01-17_11:25:02.741+0000: [N] This node will be expelled from cluster rds.gpfs.servers due to expel msg from 10.10.12.41 (b ber-les-nsd01-data.bb2.cluster in rds.gpfs.server 2019-01-17_11:25:03.160+0000: [N] sdrServ: Received expel data collection request from 10.20.0.56 They always appear to be to a specific type of hardware with the same Ethernet controller, though the nodes are split across three data centres and we aren?t seeing link congestion on the links between them. On the node I listed above, it?s not actually doing anything either as the software on it is still being installed (i.e. it?s not doing GPFS or any other IO other than a couple of home directories). Any suggestions on what ?(socket 153) state is unexpected? means? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From orichards at pixitmedia.com Thu Jan 17 20:52:50 2019 From: orichards at pixitmedia.com (Orlando Richards) Date: Thu, 17 Jan 2019 20:52:50 +0000 Subject: [gpfsug-discuss] Node expels In-Reply-To: References: Message-ID: <4e0ea3c4-3076-e9a0-55c3-58f98be96d9b@pixitmedia.com> Hi Simon, We've had to disable the offload's for Intel cards in many situations with the i40e drivers - Redhat have an article about it: https://access.redhat.com/solutions/3662011 ------- Orlando On 17/01/2019 19:02, Simon Thompson wrote: > So we've backed out a bunch of network tuning parameters we had set > (based on the GPFS wiki pages), they've been set a while but um ... > maybe they are causing issues. > > Secondly, we've noticed in dump tscomm that we see connection broken > to a node, and then the node ID is usually the same node, which is a > bit weird to me. > > We've also just updated firmware on the Intel nics (the x722) which is > part of the Skylake board. And specifically its the newer skylake kit > we see this problem on. We've a number of issues with the x722 > firmware (like it won't even bring a link up when plugged into some of > our 10GbE switches, but that's another story). > > We've also dropped the bonded links from these nodes, just in case its > related... > > Simon > > ------------------------------------------------------------------------ > *From:* gpfsug-discuss-bounces at spectrumscale.org > [gpfsug-discuss-bounces at spectrumscale.org] on behalf of > jlewars at us.ibm.com [jlewars at us.ibm.com] > *Sent:* 17 January 2019 14:30 > *To:* Tomer Perry; gpfsug main discussion list > *Cc:* Yong Ze Chen > *Subject:* Re: [gpfsug-discuss] Node expels > > >They always appear to be to a specific type of hardware with the same > Ethernet controller, > > That makes me think you might be seeing packet loss that could require > ring buffer tuning (the defaults and limits will differ with different > ethernet adapters). > > The expel section in the slides on this page has been expanded to > include a 'debugging expels section' (slides 19-20, which also > reference ring buffer tuning): > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/DEBUG%20Expels/comment/7e4f9433-7ca3-430f-b40b-94777c507381 > > Regards, > John Lewars > Spectrum Scale Performance, IBM Poughkeepsie > > > > > From: Tomer Perry/Israel/IBM > To: gpfsug main discussion list > Cc: John Lewars/Poughkeepsie/IBM at IBMUS, Yong Ze Chen/China/IBM at IBMCN > Date: 01/17/2019 08:28 AM > Subject: Re: [gpfsug-discuss] Node expels > ------------------------------------------------------------------------ > > > Hi, > > I was asked to elaborate a bit ( thus also adding John and Yong Ze Chen). > > As written on the slide: > One of the best ways to determine if a network layer problem is root > cause for an expel is to look at the low-level socket details dumped > in the ?extra? log data (mmfs dump all) saved as part of automatic > data collection on Linux GPFS nodes. > > So, the idea is that in expel situation, we dump the socket state from > the OS ( you can see the same using 'ss -i' for example). > In your example, it shows that the ca_state is 4, there are > retransmits, high rto and all the point to a network problem. > You can find more details here: > http://www.yonch.com/tech/linux-tcp-congestion-control-internals > > > Regards, > > Tomer Perry > Scalable I/O Development (Spectrum Scale) > email: tomp at il.ibm.com > 1 Azrieli Center, Tel Aviv 67021, Israel > Global Tel: ? ?+1 720 3422758 > Israel Tel: ? ? ?+972 3 9188625 > Mobile: ? ? ? ? +972 52 2554625 > > > > > > From: "Tomer Perry" > To: gpfsug main discussion list > Date: 17/01/2019 13:46 > Subject: Re: [gpfsug-discuss] Node expels > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------------------------------------------------ > > > > Simon, > > Take a look at > _http://files.gpfsug.org/presentations/2018/USA/Scale_Network_Flow-0.8.pdf_slide > 13. > > > Regards, > > Tomer Perry > Scalable I/O Development (Spectrum Scale) > email: tomp at il.ibm.com > 1 Azrieli Center, Tel Aviv 67021, Israel > Global Tel: ? ?+1 720 3422758 > Israel Tel: ? ? ?+972 3 9188625 > Mobile: ? ? ? ? +972 52 2554625 > > > > > From: Simon Thompson > To: "gpfsug-discuss at spectrumscale.org" > Date: 17/01/2019 13:35 > Subject: [gpfsug-discuss] Node expels > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------------------------------------------------ > > > > We?ve recently been seeing quite a few node expels with messages of > the form: > > 2019-01-17_11:19:30.882+0000: [W] The TCP connection to IP address > 10.20.0.58 proto-pg-pf01.bear.cluster (socket 153) state is > unexpected: state=1 ca_state=4 snd_cwnd=1 snd_ssthresh=5 unacked=5 > probes=0 backoff=7 retransmits=7 rto=26496000 rcv_ssthresh=102828 > rtt=6729 rttvar=12066 sacked=0 retrans=1 reordering=3 lost=5 > 2019-01-17_11:19:30.882+0000: [I] tscCheckTcpConn: Sending debug data > collection request to node 10.20.0.58 proto-pg-pf01.bear.cluster > 2019-01-17_11:19:30.882+0000: Sending request to collect TCP debug > data to proto-pg-pf01.bear.cluster localNode > 2019-01-17_11:19:30.882+0000: [I] Calling user exit script > gpfsSendRequestToNodes: event sendRequestToNodes, Async command > /usr/lpp/mmfs/bin/mmcommon. > 2019-01-17_11:24:52.611+0000: [E] Timed out in 300 seconds waiting for > a commMsgCheckMessages reply from node 10.20.0.58 > proto-pg-pf01.bear.cluster. Sending expel message. > > On the client node, we see messages of the form: > > 2019-01-17_11:19:31.101+0000: [N] sdrServ: Received Tcp data > collection request from 10.10.0.33 > 2019-01-17_11:19:31.102+0000: [N] GPFS will attempt to collect Tcp > debug data on this node. > 2019-01-17_11:24:52.838+0000: [N] sdrServ: Received expel data > collection request from 10.10.0.33 > 2019-01-17_11:24:52.838+0000: [N] GPFS will attempt to collect debug > data on this node. > 2019-01-17_11:25:02.741+0000: [N] This node will be expelled from > cluster rds.gpfs.servers due to expel msg from 10.10.12.41 (b > ber-les-nsd01-data.bb2.cluster in rds.gpfs.server > 2019-01-17_11:25:03.160+0000: [N] sdrServ: Received expel data > collection request from 10.20.0.56 > > They always appear to be to a specific type of hardware with the same > Ethernet controller, though the nodes are split across three data > centres and we aren?t seeing link congestion on the links between them. > > On the node I listed above, it?s not actually doing anything either as > the software on it is still being installed (i.e. it?s not doing GPFS > or any other IO other than a couple of home directories). > > Any suggestions on what ?(socket 153) state is unexpected? means? > > Thanks > > Simon > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org_ > __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Fri Jan 18 15:23:09 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 18 Jan 2019 15:23:09 +0000 Subject: [gpfsug-discuss] DSS-G Message-ID: <39dbe81fed83a72913decd11cc3f55e04ef5bb08.camel@strath.ac.uk> Anyone out their with a DSS-G using SR650 servers? We have one and after some hassle we have finally got the access to the software downloads and I have been reading through the documentation to familiarize myself with the upgrade procedure. Skipping over the shear madness of that which appears to involved doing a complete netboot reisntall of the nodes for every upgrade, it looks like we have wrong hardware. It all came in a Lenovo rack with factory cabling so one assumes it would be correct. However the "Manufactoring Preload Procedure" document says The DSS-G installation scripts assume that IPMI access to the servers is set up through the first regular 1GbE Ethernet port of the server (marked with a green star in figure 21) in shared mode, not through the dedicated IPMI port under the first three PCIe slots of the SR650 server?s back, and not on the lower left side of the x3650 M5 server?s back. Except our SR650's have 2x10GbE SFP+ LOM and the XCC is connected to the dedicated IPMI port. Oh great, reinstalling the OS for an update is already giving me the screaming heebie jeebies, but now my factory delivered setup is wrong. So in my book increased chance of the install procedure writing all over the disks during install and blowing away the NSD's. Last time I was involved in an net install of RHEL (well CentOS but makes little difference) onto a GPFS not with attached disks the installer wrote all over the NSD descriptors and destroyed the file system. So before one plays war with Lenovo for shipping an unsupported configuration I was wondering how other DSS-G's with SR650's have come from the factory. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From S.J.Thompson at bham.ac.uk Fri Jan 18 16:02:48 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 18 Jan 2019 16:02:48 +0000 Subject: [gpfsug-discuss] DSS-G In-Reply-To: <39dbe81fed83a72913decd11cc3f55e04ef5bb08.camel@strath.ac.uk> References: <39dbe81fed83a72913decd11cc3f55e04ef5bb08.camel@strath.ac.uk> Message-ID: <70C48D1B-4E99-4831-A9D9-AFD326154D8A@bham.ac.uk> I have several. One of mine was shipped for customer rack (which happened to be an existing Lenovo rack anyway), the other was based on 3560m5 so cabled differently then anyway (and its now a franken DSS-G as we upgraded the servers to SR650 and added an SSD tray, but I have so much non-standard Lenovo config stuff in our systems ....) If you bond the LOM ports together then you can't use the XCC in shared mode. But the installer scripts will make it shared when you reinstall/upgrade. Well, it can half work in some cases depending on how you have your switch connected. For example we set the switch to fail back to non-bond mode (relatively common now), which is find when the OS is not booted, you can talk to XCC. But as soon as the OS boots and it bonds, the switch port turns into a bond/trunk port and BAM, you can no longer talk to the XCC port. We have an xcat post script to put it back to being dedicated on the XCC port. So during install you lose access for a little while whilst the Lenovo script runs before my script puts it back again. And if you read the upgrade guide, then it tells you to unplug the SAS ports before doing the reinstall (OK I haven't checked the 2.2a upgrade guide, but it always did). HOWEVER, the xcat template for DSS-G should also black list the SAS driver to prevent it seeing the attached JBOD storage. AND GPFS now writes proper GPT headers as well to the disks which the installer should then leave alone. (But yes, haven't we all done an install and wiped the disk headers ... GPFS works great until you try to mount the file-system sometime later) On the needing to reinstall ... I agree I don't like the reinstall to upgrade between releases, but if you look what it's doing it sorta half makes sense. For example it force flashes an exact validated firmware onto the SAS cards and forces the port config etc onto the card to being in a known current state. I don't like it, but I see why it's done like that. We have in the past picked the relevant bits out (e.g. disk firmware and GPFS packages), and done just those, THIS IS NOT SUPPORTED, but we did pick it apart to see what had changed. If you go to 2.2a as well, the gui is now moved out (it was a bad idea to install on the DSS-G nodes anyway I'm sure), and the pmcollector package magically doesn't get installed either on the DSS-G nodes. Oh AND, the LOM ports ... if you upgrade to DSS-G 2.2a, that will flash the firmware to Intel 4.0 release for the X722. And that doesn't work if you have Mellanox Ethernet switches running Cumulus. (we proved it was the firmware by upgrading another SR650 to the latest firmware and suddenly it no longer works) - you won't get a link up, even at PXE time so not a driver issue. And if you have a VDX switch you need another workaround ... Simon ?On 18/01/2019, 15:38, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" wrote: Anyone out their with a DSS-G using SR650 servers? We have one and after some hassle we have finally got the access to the software downloads and I have been reading through the documentation to familiarize myself with the upgrade procedure. Skipping over the shear madness of that which appears to involved doing a complete netboot reisntall of the nodes for every upgrade, it looks like we have wrong hardware. It all came in a Lenovo rack with factory cabling so one assumes it would be correct. However the "Manufactoring Preload Procedure" document says The DSS-G installation scripts assume that IPMI access to the servers is set up through the first regular 1GbE Ethernet port of the server (marked with a green star in figure 21) in shared mode, not through the dedicated IPMI port under the first three PCIe slots of the SR650 server?s back, and not on the lower left side of the x3650 M5 server?s back. Except our SR650's have 2x10GbE SFP+ LOM and the XCC is connected to the dedicated IPMI port. Oh great, reinstalling the OS for an update is already giving me the screaming heebie jeebies, but now my factory delivered setup is wrong. So in my book increased chance of the install procedure writing all over the disks during install and blowing away the NSD's. Last time I was involved in an net install of RHEL (well CentOS but makes little difference) onto a GPFS not with attached disks the installer wrote all over the NSD descriptors and destroyed the file system. So before one plays war with Lenovo for shipping an unsupported configuration I was wondering how other DSS-G's with SR650's have come from the factory. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathan.buzzard at strath.ac.uk Fri Jan 18 17:14:52 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 18 Jan 2019 17:14:52 +0000 Subject: [gpfsug-discuss] DSS-G In-Reply-To: <70C48D1B-4E99-4831-A9D9-AFD326154D8A@bham.ac.uk> References: <39dbe81fed83a72913decd11cc3f55e04ef5bb08.camel@strath.ac.uk> <70C48D1B-4E99-4831-A9D9-AFD326154D8A@bham.ac.uk> Message-ID: <901117abe1768c9d02aae3b6cc9b5cf47dc3cc97.camel@strath.ac.uk> On Fri, 2019-01-18 at 16:02 +0000, Simon Thompson wrote: [SNIP] > > If you bond the LOM ports together then you can't use the XCC in > shared mode. But the installer scripts will make it shared when you > reinstall/upgrade. Well, it can half work in some cases depending on > how you have your switch connected. For example we set the switch to > fail back to non-bond mode (relatively common now), which is find > when the OS is not booted, you can talk to XCC. But as soon as the OS > boots and it bonds, the switch port turns into a bond/trunk port and > BAM, you can no longer talk to the XCC port. We don't have that issue :-) Currently there is nothing plugged into the LOM because we are using the Mellanox ConnectX4 card for bonded 40Gbps Ethernet to carry the GPFS traffic in the main with one of the ports on the two cards set to Infiniband so the storage can be mounted on an old cluster which only has 1Gb Ethernet (new cluster uses 10GbE networking to carry storage). However we have a shortage of 10GbE ports and the documentation says it should be 1GbE anyway, hence asking what Lenovo might have shipped to other people, as we have a disparity between what has been shipped and what the documentation says it should be like. [SNIP] > And if you read the upgrade guide, then it tells you to unplug the > SAS ports before doing the reinstall (OK I haven't checked the 2.2a > upgrade guide, but it always did). Well the 2.2a documentation does not say anything about that :-) I had basically decided however it was going to be necessary for safety purposes. While I do have a full backup of the file system I don't want to have to use it. > HOWEVER, the xcat template for DSS-G should also black list the SAS > driver to prevent it seeing the attached JBOD storage. AND GPFS now > writes proper GPT headers as well to the disks which the installer > should then leave alone. (But yes, haven't we all done an install and > wiped the disk headers ... GPFS works great until you try to mount > the file-system sometime later) Well I have never wiped my NSD's, just the numpty getting ready to prepare the CentOS6 upgrade for the cluster forgot to unzone the storage arrays (cluster had FC attached storage to all nodes for performance reasons, back in the day 4Gb FC was a lot cheaper than 10GbE and 1GbE was not fast enough) and wiped it for me :-( > On the needing to reinstall ... I agree I don't like the reinstall to > upgrade between releases, but if you look what it's doing it sorta > half makes sense. For example it force flashes an exact validated > firmware onto the SAS cards and forces the port config etc onto the > card to being in a known current state. I don't like it, but I see > why it's done like that. Except that does not require a reinstall of the OS to achieve. Reinstalling from scratch for an update is complete madness IMHO. > > If you go to 2.2a as well, the gui is now moved out (it was a bad > idea to install on the DSS-G nodes anyway I'm sure), and the > pmcollector package magically doesn't get installed either on the > DSS-G nodes. > Currently we don't have the GUI installed anywhere. I am not sure I trust IBM yet to not change the GUI completely again to be bothered getting it to work. > Oh AND, the LOM ports ... if you upgrade to DSS-G 2.2a, that will > flash the firmware to Intel 4.0 release for the X722. And that > doesn't work if you have Mellanox Ethernet switches running > Cumulus. (we proved it was the firmware by upgrading another SR650 > to the latest firmware and suddenly it no longer works) - you won't > get a link up, even at PXE time so not a driver issue. And if you > have a VDX switch you need another workaround ... > We have Lenovo switches, so hopefully Lenovo tested with their own switches work ;-) Mind you I get this running the dssgcktopology tool Warning: Unsupported configuration of odd number of enclosures detected. Which nitwit wrote that script then? From the "Manufacturing Preload Procedure" for 2.2a on page 9 For the high density DSS models DSS-G210, DSS-G220, DSS-G240 and DSS-G260 with 3.5? NL-SAS disks (7.2k RPM), the DSS-G building block contains one, two, four or six Lenovo D3284 disk enclosures. Right so what is it then? Because one enclosure which is clearly an odd number of enclosures is allegedly an unsupported configuration according to the tool, but supported according to the documentation!!! JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From matthew.robinson02 at gmail.com Fri Jan 18 19:25:35 2019 From: matthew.robinson02 at gmail.com (Matthew Robinson) Date: Fri, 18 Jan 2019 14:25:35 -0500 Subject: [gpfsug-discuss] DSS-G In-Reply-To: <39dbe81fed83a72913decd11cc3f55e04ef5bb08.camel@strath.ac.uk> References: <39dbe81fed83a72913decd11cc3f55e04ef5bb08.camel@strath.ac.uk> Message-ID: Hi Jonathan, In the last DSS 2.x tarballs there should a PDG included. This should provide alot of detail going over the solutions configuration and common problems for troubleshooting. Or at least the Problem Determantion Guide was there be for my department let me go. The shared IMM port is pretty standard from the 3650 to the SD530's for the most part. You should have a port marked shared on either and the IPMI interace is to be shared mode for dual subnet masks on the same NIC. This is is the standard xcat configuration from Sourcforge. If I am not mistaken the PDG should be stored in the first DSS-G version tarball for reference. Hope this helps, Matthew Robinson On Fri, Jan 18, 2019 at 10:23 AM Jonathan Buzzard < jonathan.buzzard at strath.ac.uk> wrote: > > Anyone out their with a DSS-G using SR650 servers? > > We have one and after some hassle we have finally got the access to the > software downloads and I have been reading through the documentation to > familiarize myself with the upgrade procedure. > > Skipping over the shear madness of that which appears to involved doing > a complete netboot reisntall of the nodes for every upgrade, it looks > like we have wrong hardware. It all came in a Lenovo rack with factory > cabling so one assumes it would be correct. > > However the "Manufactoring Preload Procedure" document says > > The DSS-G installation scripts assume that IPMI access to the > servers is set up through the first regular 1GbE Ethernet port > of the server (marked with a green star in figure 21) in shared > mode, not through the dedicated IPMI port under the first three > PCIe slots of the SR650 server?s back, and not on the lower left > side of the x3650 M5 server?s back. > > Except our SR650's have 2x10GbE SFP+ LOM and the XCC is connected to > the dedicated IPMI port. Oh great, reinstalling the OS for an update is > already giving me the screaming heebie jeebies, but now my factory > delivered setup is wrong. So in my book increased chance of the install > procedure writing all over the disks during install and blowing away > the NSD's. Last time I was involved in an net install of RHEL (well > CentOS but makes little difference) onto a GPFS not with attached disks > the installer wrote all over the NSD descriptors and destroyed the file > system. > > So before one plays war with Lenovo for shipping an unsupported > configuration I was wondering how other DSS-G's with SR650's have come > from the factory. > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Matthew Robinson Comptia A+, Net+ 919.909.0494 matthew.robinson02 at gmail.com The greatest discovery of my generation is that man can alter his life simply by altering his attitude of mind. - William James, Harvard Psychologist. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Mon Jan 21 15:59:29 2019 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Mon, 21 Jan 2019 15:59:29 +0000 Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Message-ID: <5b711884338047fbaf5a61005964652d@SMXRF105.msg.hukrf.de> Hello All, We test spectrum scale on an windows only Client-Cluster (remote mounted to a linux Cluster) but the execution of mm commands in cygwin is very slow. We have tried the following adjustments to increase the execution speed. * We have installed Cygwin Server as a service (cygserver-config). Unfortunately, this resulted in no faster execution. * Adaptation of the hosts file: 127.0.0.1 localhost cygdrive wpad to prevent any DNS problems when accessing ?/cygdrive/...? * Started them as Administrator All adjustments have so far not led to any improvement. Are there any hints to enhance the cmd execution time on windows (w2k12 actual used) Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From spectrumscale at kiranghag.com Mon Jan 21 16:03:13 2019 From: spectrumscale at kiranghag.com (KG) Date: Mon, 21 Jan 2019 21:33:13 +0530 Subject: [gpfsug-discuss] Dr site using full replication? Message-ID: Hi Folks Has anyone replicated scale node to a dr site by replicating boot disks and nsd ? The same hostnames and ip subnet would be available on the other site and cluster should be able to operate from any one location at a time. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon Jan 21 16:02:50 2019 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 21 Jan 2019 16:02:50 +0000 Subject: [gpfsug-discuss] Get list of filesets_without_runningmmlsfileset? In-Reply-To: References: <5DCBA252-33B7-4AC9-B2E3-10DA6881B1AA@vanderbilt.edu> Message-ID: <60451989-2E0B-4CF9-A6E2-BC0939169311@vanderbilt.edu> Hi All, I just wanted to follow up on this thread ? the only way I have found to obtain a list of filesets and their associated junction paths as a non-root user is via the REST API (and thanks to those who suggested that). However, AFAICT querying the REST API via a script would expose the username / password used to do so to anyone who bothered to look at the code, which would in turn allow a knowledgeable and curious user to query the REST API themselves for other information we do not necessarily want to expose to them. Therefore, it is not an acceptable solution to us. Therefore, unless someone responds with a way to allow a non-root user to obtain fileset junction paths that doesn?t involve the REST API, I?m afraid I?m at a dead end in terms of making our quota usage Python script something that I can share with the broader community. It just has too much site-specific code in it. Sorry? Kevin P.S. In case you?re curious about how the quota script is obtaining those junction paths ? we have a cron job that runs once per hour on the cluster manager that dumps the output of mmlsfileset to a text file, which the script then reads. The cron job used to just run once per day and used to just run mmlsfileset. I have modified it to be a shell script which checks for the load average on the cluster manager being less than 10 and that there are no waiters of more than 10 seconds duration. If both of those conditions are true, it runs mmlsfileset. If either are not, it simply exits ? the idea being that one or both of those would likely be true if something were going on with the cluster manager that would cause the mmlsfileset to hang. I have also modified the quota script itself so that it checks that the junction path for a fileset actually exists before attempting to stat it (duh - should?ve done that from the start), which handles the case where a user would run the quota script and it would bomb off with an exception because the fileset was deleted and the cron job hadn?t run yet. If a new fileset is created, well, it just won?t get checked by the quota script until the cron job runs successfully. We have decided that this is an acceptable compromise. On Jan 15, 2019, at 8:46 AM, Marc A Kaplan > wrote: Personally, I agree that there ought to be a way in the product. In the meawhile, you no doubt already have some ways to tell your users where to find their filesets as pathnames. Otherwise, how are they accessing their files? And to keep things somewhat sane, I'd bet filesets are all linked to one or small number of well known paths in the filesystem. Like /AGpfsFilesystem/filesets/... Plus you could add symlinks and/or as has been suggested post info extracted from mmlsfileset and/or mmlsquota. So as a practical matter, is this an urgent problem...? Why? How? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeverdon at us.ibm.com Mon Jan 21 22:41:26 2019 From: jeverdon at us.ibm.com (Jodi E Everdon) Date: Mon, 21 Jan 2019 17:41:26 -0500 Subject: [gpfsug-discuss] post to list Message-ID: Jodi Everdon IBM New Technology Introduction (NTI) 2455 South Road Client Experience Validation Poughkeepsie, NY 12601 Email: jeverdon at us.ibm.com North America IBM IT Infrastructure: www.ibm.com/it-infrastructure -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 15606074.gif Type: image/gif Size: 1851 bytes Desc: not available URL: From scale at us.ibm.com Mon Jan 21 23:34:31 2019 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Mon, 21 Jan 2019 15:34:31 -0800 Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays In-Reply-To: <5b711884338047fbaf5a61005964652d@SMXRF105.msg.hukrf.de> References: <5b711884338047fbaf5a61005964652d@SMXRF105.msg.hukrf.de> Message-ID: Hello Renar, A few things to try: Make sure IPv6 is disabled. On each Windows node, run "mmcmi host ", with being itself and each and every node in the cluster. Make sure mmcmi prints valid IPv4 address. To eliminate DNS issues, try adding IPv4 entries for each cluster node in "c:\windows\system32\drivers\etc\hosts". If any anti-virus is active, disable realtime scanning on c:\cygwin64 (wherever you installed cygwin 64-bit). You can also try debugging a script, say: (from GPFS ksh): DEBUG=1 mmlscluster, and see what takes time. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Grunenberg, Renar" To: "'gpfsug-discuss at spectrumscale.org'" Date: 01/21/2019 08:01 AM Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello All, We test spectrum scale on an windows only Client-Cluster (remote mounted to a linux Cluster) but the execution of mm commands in cygwin is very slow. We have tried the following adjustments to increase the execution speed. We have installed Cygwin Server as a service (cygserver-config). Unfortunately, this resulted in no faster execution. Adaptation of the hosts file: 127.0.0.1 localhost cygdrive wpad to prevent any DNS problems when accessing ?/cygdrive/...? Started them as Administrator All adjustments have so far not led to any improvement. Are there any hints to enhance the cmd execution time on windows (w2k12 actual used) Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=frR4WiYT89JSgLnJMtRAlESzRXWW2YatEwsuuV8M810&s=FSjMBxMo8G8y3VR2A59hgIWaHPKPFNHU7RXcneIVCPE&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From A.Wolf-Reber at de.ibm.com Tue Jan 22 07:36:15 2019 From: A.Wolf-Reber at de.ibm.com (Alexander Wolf) Date: Tue, 22 Jan 2019 07:36:15 +0000 Subject: [gpfsug-discuss] Get list of filesets_without_runningmmlsfileset? In-Reply-To: <60451989-2E0B-4CF9-A6E2-BC0939169311@vanderbilt.edu> References: <60451989-2E0B-4CF9-A6E2-BC0939169311@vanderbilt.edu>, <5DCBA252-33B7-4AC9-B2E3-10DA6881B1AA@vanderbilt.edu> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15481420128480.png Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15481420128481.png Type: image/png Size: 6645 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15481420128482.png Type: image/png Size: 1134 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Tue Jan 22 14:35:02 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 22 Jan 2019 14:35:02 +0000 Subject: [gpfsug-discuss] Node expels In-Reply-To: References: Message-ID: <0B0D4ACE-1B54-4D22-85E3-B3154DD7C943@bham.ac.uk> OK we think we might have a reason for this. We run iptables on some of our management function nodes, and we found that in some cases, our config management tool can cause a ?systemctl restart iptables? to occur (the rule ordering generation was non deterministic meaning it could shuffle rules ? we fixed that and made it reload rather than restart). Which takes a fraction of a second, but it appears that this is sufficient for GPFS to get into a state. What I didn?t mention before was that we could get it into a state where the only way to recover was to shutdown the storage cluster and restart it. I?m not sure why normal expel and recovery doesn?t appear to work in this case, though we?re not 100% certain that its iptables restart. (we just have a very smoky gun at present). (I have a ticket with that question open). Maybe it?s a combination of having a default DROP policy on iptables as well - we have also switched to ACCEPT and added a DROP rule at the end of the ruleset which gives the same result. Simon From: on behalf of "jlewars at us.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Thursday, 17 January 2019 at 14:31 To: Tomer Perry , "gpfsug-discuss at spectrumscale.org" Cc: Yong Ze Chen Subject: Re: [gpfsug-discuss] Node expels >They always appear to be to a specific type of hardware with the same Ethernet controller, That makes me think you might be seeing packet loss that could require ring buffer tuning (the defaults and limits will differ with different ethernet adapters). The expel section in the slides on this page has been expanded to include a 'debugging expels section' (slides 19-20, which also reference ring buffer tuning): https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/DEBUG%20Expels/comment/7e4f9433-7ca3-430f-b40b-94777c507381 Regards, John Lewars Spectrum Scale Performance, IBM Poughkeepsie From: Tomer Perry/Israel/IBM To: gpfsug main discussion list Cc: John Lewars/Poughkeepsie/IBM at IBMUS, Yong Ze Chen/China/IBM at IBMCN Date: 01/17/2019 08:28 AM Subject: Re: [gpfsug-discuss] Node expels ________________________________ Hi, I was asked to elaborate a bit ( thus also adding John and Yong Ze Chen). As written on the slide: One of the best ways to determine if a network layer problem is root cause for an expel is to look at the low-level socket details dumped in the ?extra? log data (mmfs dump all) saved as part of automatic data collection on Linux GPFS nodes. So, the idea is that in expel situation, we dump the socket state from the OS ( you can see the same using 'ss -i' for example). In your example, it shows that the ca_state is 4, there are retransmits, high rto and all the point to a network problem. You can find more details here: http://www.yonch.com/tech/linux-tcp-congestion-control-internals Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: "Tomer Perry" To: gpfsug main discussion list Date: 17/01/2019 13:46 Subject: Re: [gpfsug-discuss] Node expels Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Simon, Take a look at http://files.gpfsug.org/presentations/2018/USA/Scale_Network_Flow-0.8.pdfslide 13. Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 17/01/2019 13:35 Subject: [gpfsug-discuss] Node expels Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ We?ve recently been seeing quite a few node expels with messages of the form: 2019-01-17_11:19:30.882+0000: [W] The TCP connection to IP address 10.20.0.58 proto-pg-pf01.bear.cluster (socket 153) state is unexpected: state=1 ca_state=4 snd_cwnd=1 snd_ssthresh=5 unacked=5 probes=0 backoff=7 retransmits=7 rto=26496000 rcv_ssthresh=102828 rtt=6729 rttvar=12066 sacked=0 retrans=1 reordering=3 lost=5 2019-01-17_11:19:30.882+0000: [I] tscCheckTcpConn: Sending debug data collection request to node 10.20.0.58 proto-pg-pf01.bear.cluster 2019-01-17_11:19:30.882+0000: Sending request to collect TCP debug data to proto-pg-pf01.bear.cluster localNode 2019-01-17_11:19:30.882+0000: [I] Calling user exit script gpfsSendRequestToNodes: event sendRequestToNodes, Async command /usr/lpp/mmfs/bin/mmcommon. 2019-01-17_11:24:52.611+0000: [E] Timed out in 300 seconds waiting for a commMsgCheckMessages reply from node 10.20.0.58 proto-pg-pf01.bear.cluster. Sending expel message. On the client node, we see messages of the form: 2019-01-17_11:19:31.101+0000: [N] sdrServ: Received Tcp data collection request from 10.10.0.33 2019-01-17_11:19:31.102+0000: [N] GPFS will attempt to collect Tcp debug data on this node. 2019-01-17_11:24:52.838+0000: [N] sdrServ: Received expel data collection request from 10.10.0.33 2019-01-17_11:24:52.838+0000: [N] GPFS will attempt to collect debug data on this node. 2019-01-17_11:25:02.741+0000: [N] This node will be expelled from cluster rds.gpfs.servers due to expel msg from 10.10.12.41 (b ber-les-nsd01-data.bb2.cluster in rds.gpfs.server 2019-01-17_11:25:03.160+0000: [N] sdrServ: Received expel data collection request from 10.20.0.56 They always appear to be to a specific type of hardware with the same Ethernet controller, though the nodes are split across three data centres and we aren?t seeing link congestion on the links between them. On the node I listed above, it?s not actually doing anything either as the software on it is still being installed (i.e. it?s not doing GPFS or any other IO other than a couple of home directories). Any suggestions on what ?(socket 153) state is unexpected? means? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From rmoye at quantlab.com Tue Jan 22 15:43:26 2019 From: rmoye at quantlab.com (Roger Moye) Date: Tue, 22 Jan 2019 15:43:26 +0000 Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays In-Reply-To: References: <5b711884338047fbaf5a61005964652d@SMXRF105.msg.hukrf.de> Message-ID: <18bab23b080c4ad487c68b8ebc04b975@quantlab.com> We experienced the same issue and were advised not to use Windows for quorum nodes. We moved our Windows nodes into the storage cluster which was entirely Linux and that solved it. If this is not an option, perhaps adding some Linux nodes to your remote cluster as quorum nodes would help. -Roger From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of IBM Spectrum Scale Sent: Monday, January 21, 2019 5:35 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Hello Renar, A few things to try: 1. Make sure IPv6 is disabled. On each Windows node, run "mmcmi host ", with being itself and each and every node in the cluster. Make sure mmcmi prints valid IPv4 address. 1. To eliminate DNS issues, try adding IPv4 entries for each cluster node in "c:\windows\system32\drivers\etc\hosts". 1. If any anti-virus is active, disable realtime scanning on c:\cygwin64 (wherever you installed cygwin 64-bit). You can also try debugging a script, say: (from GPFS ksh): DEBUG=1 mmlscluster, and see what takes time. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Grunenberg, Renar" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 01/21/2019 08:01 AM Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello All, We test spectrum scale on an windows only Client-Cluster (remote mounted to a linux Cluster) but the execution of mm commands in cygwin is very slow. We have tried the following adjustments to increase the execution speed. * We have installed Cygwin Server as a service (cygserver-config). Unfortunately, this resulted in no faster execution. * Adaptation of the hosts file: 127.0.0.1localhost cygdrive wpad to prevent any DNS problems when accessing "/cygdrive/..." * Started them as Administrator All adjustments have so far not led to any improvement. Are there any hints to enhance the cmd execution time on windows (w2k12 actual used) Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ----------------------------------------------------------------------------------- The information in this communication and any attachment is confidential and intended solely for the attention and use of the named addressee(s). All information and opinions expressed herein are subject to change without notice. This communication is not to be construed as an offer to sell or the solicitation of an offer to buy any security. Any such offer or solicitation can only be made by means of the delivery of a confidential private offering memorandum (which should be carefully reviewed for a complete description of investment strategies and risks). Any reliance one may place on the accuracy or validity of this information is at their own risk. Past performance is not necessarily indicative of the future results of an investment. All figures are estimated and unaudited unless otherwise noted. If you are not the intended recipient, or a person responsible for delivering this to the intended recipient, you are not authorized to and must not disclose, copy, distribute, or retain this message or any part of it. In this case, please notify the sender immediately at 713-333-5440 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Tue Jan 22 17:10:24 2019 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Tue, 22 Jan 2019 17:10:24 +0000 Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays In-Reply-To: <18bab23b080c4ad487c68b8ebc04b975@quantlab.com> References: <5b711884338047fbaf5a61005964652d@SMXRF105.msg.hukrf.de> <18bab23b080c4ad487c68b8ebc04b975@quantlab.com> Message-ID: Hallo Roger, first thanks fort he tip. But we decided to separate the linux-io-Cluster from the Windows client only cluster, because of security requirements and ssh management requirements. We can use at this point, local named admins on Windows and use on Linux a Deamon and an separated Admin-interface Network for pwless root ssh. Your Hint seems to be CCR related or is this a Cygwin problem. @Spectrum Scale Team: Point1: IP V6 can?t disabled because of applications that want to use this. But the mmcmi cmd are give us already the right ipv4 adresses. Point2. There are no DNS-Issues Point3: We must check these. Any recommendations to Rogers statements? Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Roger Moye Gesendet: Dienstag, 22. Januar 2019 16:43 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays We experienced the same issue and were advised not to use Windows for quorum nodes. We moved our Windows nodes into the storage cluster which was entirely Linux and that solved it. If this is not an option, perhaps adding some Linux nodes to your remote cluster as quorum nodes would help. -Roger From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of IBM Spectrum Scale Sent: Monday, January 21, 2019 5:35 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Hello Renar, A few things to try: 1. Make sure IPv6 is disabled. On each Windows node, run "mmcmi host ", with being itself and each and every node in the cluster. Make sure mmcmi prints valid IPv4 address. 1. To eliminate DNS issues, try adding IPv4 entries for each cluster node in "c:\windows\system32\drivers\etc\hosts". 1. If any anti-virus is active, disable realtime scanning on c:\cygwin64 (wherever you installed cygwin 64-bit). You can also try debugging a script, say: (from GPFS ksh): DEBUG=1 mmlscluster, and see what takes time. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Grunenberg, Renar" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 01/21/2019 08:01 AM Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello All, We test spectrum scale on an windows only Client-Cluster (remote mounted to a linux Cluster) but the execution of mm commands in cygwin is very slow. We have tried the following adjustments to increase the execution speed. * We have installed Cygwin Server as a service (cygserver-config). Unfortunately, this resulted in no faster execution. * Adaptation of the hosts file: 127.0.0.1localhost cygdrive wpad to prevent any DNS problems when accessing ?/cygdrive/...? * Started them as Administrator All adjustments have so far not led to any improvement. Are there any hints to enhance the cmd execution time on windows (w2k12 actual used) Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ----------------------------------------------------------------------------------- The information in this communication and any attachment is confidential and intended solely for the attention and use of the named addressee(s). All information and opinions expressed herein are subject to change without notice. This communication is not to be construed as an offer to sell or the solicitation of an offer to buy any security. Any such offer or solicitation can only be made by means of the delivery of a confidential private offering memorandum (which should be carefully reviewed for a complete description of investment strategies and risks). Any reliance one may place on the accuracy or validity of this information is at their own risk. Past performance is not necessarily indicative of the future results of an investment. All figures are estimated and unaudited unless otherwise noted. If you are not the intended recipient, or a person responsible for delivering this to the intended recipient, you are not authorized to and must not disclose, copy, distribute, or retain this message or any part of it. In this case, please notify the sender immediately at 713-333-5440 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Achim.Rehor at de.ibm.com Tue Jan 22 18:18:03 2019 From: Achim.Rehor at de.ibm.com (Achim Rehor) Date: Tue, 22 Jan 2019 19:18:03 +0100 Subject: [gpfsug-discuss] Node expels In-Reply-To: <0B0D4ACE-1B54-4D22-85E3-B3154DD7C943@bham.ac.uk> References: <0B0D4ACE-1B54-4D22-85E3-B3154DD7C943@bham.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 7182 bytes Desc: not available URL: From Renar.Grunenberg at huk-coburg.de Wed Jan 23 12:45:39 2019 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Wed, 23 Jan 2019 12:45:39 +0000 Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays In-Reply-To: References: <5b711884338047fbaf5a61005964652d@SMXRF105.msg.hukrf.de> <18bab23b080c4ad487c68b8ebc04b975@quantlab.com> Message-ID: <349cb338583a4c1d996677837fc65b6e@SMXRF105.msg.hukrf.de> Hallo All, as a point to the problem, it seems to be that all the delayes are happening here DEBUG=1 mmgetstate ?a ??.. /bin/rm -f /var/mmfs/gen/mmsdrfs.1256 /var/mmfs/tmp/allClusterNodes.mmgetstate.1256 /var/mmfs/tmp/allQuorumNodes.mmgetstate.1256 /var/mmfs/tmp/allNonQuorumNodes.mmgetstate.1256 /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.pub /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.priv /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.cert /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.keystore /var/mmfs/tmp/nodefile.mmgetstate.1256 /var/mmfs/tmp/diskfile.mmgetstate.1256 /var/mmfs/tmp/diskNamesFile.mmgetstate.1256 Any points to this it will be fixed in the near future are welcome. Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: Grunenberg, Renar Gesendet: Dienstag, 22. Januar 2019 18:10 An: 'gpfsug main discussion list' Betreff: AW: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Hallo Roger, first thanks fort he tip. But we decided to separate the linux-io-Cluster from the Windows client only cluster, because of security requirements and ssh management requirements. We can use at this point, local named admins on Windows and use on Linux a Deamon and an separated Admin-interface Network for pwless root ssh. Your Hint seems to be CCR related or is this a Cygwin problem. @Spectrum Scale Team: Point1: IP V6 can?t disabled because of applications that want to use this. But the mmcmi cmd are give us already the right ipv4 adresses. Point2. There are no DNS-Issues Point3: We must check these. Any recommendations to Rogers statements? Regards Renar Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Roger Moye Gesendet: Dienstag, 22. Januar 2019 16:43 An: gpfsug main discussion list > Betreff: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays We experienced the same issue and were advised not to use Windows for quorum nodes. We moved our Windows nodes into the storage cluster which was entirely Linux and that solved it. If this is not an option, perhaps adding some Linux nodes to your remote cluster as quorum nodes would help. -Roger From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of IBM Spectrum Scale Sent: Monday, January 21, 2019 5:35 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Hello Renar, A few things to try: 1. Make sure IPv6 is disabled. On each Windows node, run "mmcmi host ", with being itself and each and every node in the cluster. Make sure mmcmi prints valid IPv4 address. 1. To eliminate DNS issues, try adding IPv4 entries for each cluster node in "c:\windows\system32\drivers\etc\hosts". 1. If any anti-virus is active, disable realtime scanning on c:\cygwin64 (wherever you installed cygwin 64-bit). You can also try debugging a script, say: (from GPFS ksh): DEBUG=1 mmlscluster, and see what takes time. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Grunenberg, Renar" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 01/21/2019 08:01 AM Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello All, We test spectrum scale on an windows only Client-Cluster (remote mounted to a linux Cluster) but the execution of mm commands in cygwin is very slow. We have tried the following adjustments to increase the execution speed. * We have installed Cygwin Server as a service (cygserver-config). Unfortunately, this resulted in no faster execution. * Adaptation of the hosts file: 127.0.0.1localhost cygdrive wpad to prevent any DNS problems when accessing ?/cygdrive/...? * Started them as Administrator All adjustments have so far not led to any improvement. Are there any hints to enhance the cmd execution time on windows (w2k12 actual used) Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ----------------------------------------------------------------------------------- The information in this communication and any attachment is confidential and intended solely for the attention and use of the named addressee(s). All information and opinions expressed herein are subject to change without notice. This communication is not to be construed as an offer to sell or the solicitation of an offer to buy any security. Any such offer or solicitation can only be made by means of the delivery of a confidential private offering memorandum (which should be carefully reviewed for a complete description of investment strategies and risks). Any reliance one may place on the accuracy or validity of this information is at their own risk. Past performance is not necessarily indicative of the future results of an investment. All figures are estimated and unaudited unless otherwise noted. If you are not the intended recipient, or a person responsible for delivering this to the intended recipient, you are not authorized to and must not disclose, copy, distribute, or retain this message or any part of it. In this case, please notify the sender immediately at 713-333-5440 -------------- next part -------------- An HTML attachment was scrubbed... URL: From heiner.billich at psi.ch Thu Jan 24 14:29:42 2019 From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI)) Date: Thu, 24 Jan 2019 14:29:42 +0000 Subject: [gpfsug-discuss] does ganesha deny access for unknown UIDs? Message-ID: <44051794-8F45-4725-92E0-09729474E7A1@psi.ch> Hello, a local account on a nfs client couldn?t write to a ganesha nfs export even with directory permissions 777. The solution was to create the account on the ganesha servers, too. Please can you confirm that this is the intended behaviour? is there an option to change this and to map unknown accounts to nobody instead? We often have embedded Linux appliances or similar as nfs clients which need to place some data on the nfs exports using uid/gid of local accounts. We manage gids on the server side and allow NFS v3 client access only. I crosspost this to ganesha support and to the gpfsug mailing list. Thank you, Heiner Billich ganesha version: 2.5.3-ibm028.00.el7.x86_64 the ganesha config CacheInode { fd_hwmark_percent=60; fd_lwmark_percent=20; fd_limit_percent=90; lru_run_interval=90; entries_hwmark=1500000; } NFS_Core_Param { clustered=TRUE; rpc_max_connections=10000; heartbeat_freq=0; mnt_port=33247; nb_worker=256; nfs_port=2049; nfs_protocols=3,4; nlm_port=33245; rquota_port=33246; rquota_port=33246; short_file_handle=FALSE; mount_path_pseudo=true; } GPFS { fsal_grace=FALSE; fsal_trace=TRUE; } NFSv4 { delegations=FALSE; domainname=virtual1.com; grace_period=60; lease_lifetime=60; } Export_Defaults { access_type=none; anonymous_gid=-2; anonymous_uid=-2; manage_gids=TRUE; nfs_commit=FALSE; privilegedport=FALSE; protocols=3,4; sectype=sys; squash=root_squash; transports=TCP; } one export # === START /**** id=206 nclients=3 === EXPORT { Attr_Expiration_Time=60; Delegations=none; Export_id=206; Filesystem_id=42.206; MaxOffsetRead=18446744073709551615; MaxOffsetWrite=18446744073709551615; MaxRead=1048576; MaxWrite=1048576; Path="/****"; PrefRead=1048576; PrefReaddir=1048576; PrefWrite=1048576; Pseudo="/****"; Tag="****"; UseCookieVerifier=false; FSAL { Name=GPFS; } CLIENT { # === ****/X12SA === Access_Type=RW; Anonymous_gid=-2; Anonymous_uid=-2; Clients=X.Y.A.B/24; Delegations=none; Manage_Gids=TRUE; NFS_Commit=FALSE; PrivilegedPort=FALSE; Protocols=3; SecType=SYS; Squash=Root; Transports=TCP; } ?. -- Paul Scherrer Institut Heiner Billich System Engineer Scientific Computing Science IT / High Performance Computing WHGA/106 Forschungsstrasse 111 5232 Villigen PSI Switzerland Phone +41 56 310 36 02 heiner.billich at psi.ch https://www.psi.ch -------------- next part -------------- An HTML attachment was scrubbed... URL: From truongv at us.ibm.com Thu Jan 24 18:17:45 2019 From: truongv at us.ibm.com (Truong Vu) Date: Thu, 24 Jan 2019 13:17:45 -0500 Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays In-Reply-To: References: Message-ID: Hi Renar, Let's see if it is really the /bin/rm is the problem here. Can you run the command again without cleanup the temp files as follow: DEBUG=1 keepTempFiles=1 mmgetstate -a Thanks, Tru. From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 01/23/2019 07:46 AM Subject: gpfsug-discuss Digest, Vol 84, Issue 32 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=jDHxo2hE5uOrH5xaI6YYQdQ-O5yZG-udF7ooPNOEUUM&s=UBffyp1tO8WZsaCys72XHljL9SyUe_v4ECCmymP17Lg&e= or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: Spectrum Scale Cygwin cmd delays (Grunenberg, Renar) ---------------------------------------------------------------------- Message: 1 Date: Wed, 23 Jan 2019 12:45:39 +0000 From: "Grunenberg, Renar" To: 'gpfsug main discussion list' Subject: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Message-ID: <349cb338583a4c1d996677837fc65b6e at SMXRF105.msg.hukrf.de> Content-Type: text/plain; charset="utf-8" Hallo All, as a point to the problem, it seems to be that all the delayes are happening here DEBUG=1 mmgetstate ?a ??.. /bin/rm -f /var/mmfs/gen/mmsdrfs.1256 /var/mmfs/tmp/allClusterNodes.mmgetstate.1256 /var/mmfs/tmp/allQuorumNodes.mmgetstate.1256 /var/mmfs/tmp/allNonQuorumNodes.mmgetstate.1256 /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.pub /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.priv /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.cert /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.keystore /var/mmfs/tmp/nodefile.mmgetstate.1256 /var/mmfs/tmp/diskfile.mmgetstate.1256 /var/mmfs/tmp/diskNamesFile.mmgetstate.1256 Any points to this it will be fixed in the near future are welcome. Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: Grunenberg, Renar Gesendet: Dienstag, 22. Januar 2019 18:10 An: 'gpfsug main discussion list' Betreff: AW: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Hallo Roger, first thanks fort he tip. But we decided to separate the linux-io-Cluster from the Windows client only cluster, because of security requirements and ssh management requirements. We can use at this point, local named admins on Windows and use on Linux a Deamon and an separated Admin-interface Network for pwless root ssh. Your Hint seems to be CCR related or is this a Cygwin problem. @Spectrum Scale Team: Point1: IP V6 can?t disabled because of applications that want to use this. But the mmcmi cmd are give us already the right ipv4 adresses. Point2. There are no DNS-Issues Point3: We must check these. Any recommendations to Rogers statements? Regards Renar Von: gpfsug-discuss-bounces at spectrumscale.org< mailto:gpfsug-discuss-bounces at spectrumscale.org> [ mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Roger Moye Gesendet: Dienstag, 22. Januar 2019 16:43 An: gpfsug main discussion list > Betreff: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays We experienced the same issue and were advised not to use Windows for quorum nodes. We moved our Windows nodes into the storage cluster which was entirely Linux and that solved it. If this is not an option, perhaps adding some Linux nodes to your remote cluster as quorum nodes would help. -Roger From: gpfsug-discuss-bounces at spectrumscale.org< mailto:gpfsug-discuss-bounces at spectrumscale.org> [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of IBM Spectrum Scale Sent: Monday, January 21, 2019 5:35 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Hello Renar, A few things to try: 1. Make sure IPv6 is disabled. On each Windows node, run "mmcmi host ", with being itself and each and every node in the cluster. Make sure mmcmi prints valid IPv4 address. 1. To eliminate DNS issues, try adding IPv4 entries for each cluster node in "c:\windows\system32\drivers\etc\hosts". 1. If any anti-virus is active, disable realtime scanning on c:\cygwin64 (wherever you installed cygwin 64-bit). You can also try debugging a script, say: (from GPFS ksh): DEBUG=1 mmlscluster, and see what takes time. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Grunenberg, Renar" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 01/21/2019 08:01 AM Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Sent by: gpfsug-discuss-bounces at spectrumscale.org< mailto:gpfsug-discuss-bounces at spectrumscale.org> ________________________________ Hello All, We test spectrum scale on an windows only Client-Cluster (remote mounted to a linux Cluster) but the execution of mm commands in cygwin is very slow. We have tried the following adjustments to increase the execution speed. * We have installed Cygwin Server as a service (cygserver-config). Unfortunately, this resulted in no faster execution. * Adaptation of the hosts file: 127.0.0.1localhost cygdrive wpad to prevent any DNS problems when accessing ?/cygdrive/...? * Started them as Administrator All adjustments have so far not led to any improvement. Are there any hints to enhance the cmd execution time on windows (w2k12 actual used) Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=jDHxo2hE5uOrH5xaI6YYQdQ-O5yZG-udF7ooPNOEUUM&s=UBffyp1tO8WZsaCys72XHljL9SyUe_v4ECCmymP17Lg&e= ----------------------------------------------------------------------------------- The information in this communication and any attachment is confidential and intended solely for the attention and use of the named addressee(s). All information and opinions expressed herein are subject to change without notice. This communication is not to be construed as an offer to sell or the solicitation of an offer to buy any security. Any such offer or solicitation can only be made by means of the delivery of a confidential private offering memorandum (which should be carefully reviewed for a complete description of investment strategies and risks). Any reliance one may place on the accuracy or validity of this information is at their own risk. Past performance is not necessarily indicative of the future results of an investment. All figures are estimated and unaudited unless otherwise noted. If you are not the intended recipient, or a person responsible for delivering this to the intended recipient, you are not authorized to and must not disclose, copy, distribute, o r retain this message or any part of it. In this case, please notify the sender immediately at 713-333-5440 -------------- next part -------------- An HTML attachment was scrubbed... URL: < https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_pipermail_gpfsug-2Ddiscuss_attachments_20190123_eff7ad74_attachment.html&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=jDHxo2hE5uOrH5xaI6YYQdQ-O5yZG-udF7ooPNOEUUM&s=JWv1FytE6pkOdJtqJV5sSVf3ZwV0B9FDZmfzI7LQEGk&e= > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=jDHxo2hE5uOrH5xaI6YYQdQ-O5yZG-udF7ooPNOEUUM&s=UBffyp1tO8WZsaCys72XHljL9SyUe_v4ECCmymP17Lg&e= End of gpfsug-discuss Digest, Vol 84, Issue 32 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From heiner.billich at psi.ch Fri Jan 25 09:13:53 2019 From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI)) Date: Fri, 25 Jan 2019 09:13:53 +0000 Subject: [gpfsug-discuss] [NFS-Ganesha-Support] does ganesha deny access for unknown UIDs? In-Reply-To: <35897363-6096-89e9-d22c-ba97ad10c26f@redhat.com> References: <44051794-8F45-4725-92E0-09729474E7A1@psi.ch> <35897363-6096-89e9-d22c-ba97ad10c26f@redhat.com> Message-ID: <1F7557E9-FE60-4F37-BA0A-FD4C37E124BD@psi.ch> Hello Daniel, thank you. The clients do NFS v3 mounts, hence idmap is no option - as I know it's used in NFS v4 to map between uid/guid and names only? For a process to switch to a certain uid/guid in general one does not need a matching passwd entry? I see that with ACLs you get issues as they use names, and you can't do a server-side group membership lookup, and there may be more subtle issues. Anyway, I'll create the needed accounts on the server. By the way: We had the same issue with Netapp filers and it took a while to find the configuration option to allow 'unknown' uid/gid to access a nfs v3 export. I'll try to reproduce on a test system with increased logging to see what exactly goes wrong and maybe ask later to add a configuration option to ganesha to switch to a behaviour more similar to kernel-nfs. Many client systems at my site are legacy and run various operating systems, hence a complete switch to NFS v4 is unlikely to happen soon. cheers, Heiner -- Paul Scherrer Institut Heiner Billich System Engineer Scientific Computing Science IT / High Performance Computing WHGA/106 Forschungsstrasse 111 5232 Villigen PSI Switzerland Phone +41 56 310 36 02 heiner.billich at psi.ch https://www.psi.ch ?On 24/01/19 16:35, "Daniel Gryniewicz" wrote: Hi. For local operating FSALs (like GPFS and VFS), the way Ganesha makes sure that a UID/GID combo has the correct permissions for an operation is to set the UID/GID of the thread to the one in the operation, then perform the actual operation. This way, the kernel and the underlying filesystem perform atomic permission checking on the op. This setuid/setgid will fail, of course, if the local system doesn't have that UID/GID to set to. The solution for this is to use NFS idmap to map the remote ID to a local one. This includes the ability to map unknown IDs to some local ID. Daniel On 1/24/19 9:29 AM, Billich Heinrich Rainer (PSI) wrote: > Hello, > > a local account on a nfs client couldn?t write to a ganesha nfs export > even with directory permissions 777. The solution was to create the > account on the ganesha servers, too. > > Please can you confirm that this is the intended behaviour? is there an > option to change this and to map unknown accounts to nobody instead? We > often have embedded Linux appliances or similar as nfs clients which > need to place some data on the nfs exports using uid/gid of local accounts. > > We manage gids on the server side and allow NFS v3 client access only. > > I crosspost this to ganesha support and to the gpfsug mailing list. > > Thank you, > > Heiner Billich > > ganesha version: 2.5.3-ibm028.00.el7.x86_64 From andy_kurth at ncsu.edu Fri Jan 25 16:08:12 2019 From: andy_kurth at ncsu.edu (Andy Kurth) Date: Fri, 25 Jan 2019 11:08:12 -0500 Subject: [gpfsug-discuss] does ganesha deny access for unknown UIDs? In-Reply-To: <44051794-8F45-4725-92E0-09729474E7A1@psi.ch> References: <44051794-8F45-4725-92E0-09729474E7A1@psi.ch> Message-ID: I believe this is occurring because of the manage_gids=TRUE setting. The purpose of this setting is to overcome the AUTH_SYS 16 group limit. If true, Ganesha takes the UID and resolves all of the GIDs on the server. If false, the GIDs sent by the client are used. I ran a quick test by creating a local user on the client and exporting 2 shares with 777 permissions, one with manage_gids=TRUE and one with FALSE. The user could view the share and create files with manage_gids=FALSE. ganesha.log showed that it tried and failed to resolve the UID to a name, but allowed the operation nonetheless: 2019-01-25 10:19:03 : epoch 0001004c : gpfs-proto.domain : ganesha.nfsd-123297[work-30] xdr_encode_nfs4_princ :ID MAPPER :INFO :nfs4_uid_to_name failed with code -2. 2019-01-25 10:19:03 : epoch 0001004c : gpfs-proto.domain : ganesha.nfsd-123297[work-30] xdr_encode_nfs4_princ :ID MAPPER :INFO :Lookup for 779 failed, using numeric owner With manage_gids=TRUE, the client received permission denied and ganesha.log showed the GID query failing: 2019-01-25 10:19:27 : epoch 0001004c : gpfs-proto.domain : ganesha.nfsd-123297[work-39] uid2grp_allocate_by_uid :ID MAPPER :INFO :No matching password record found for uid 779 2019-01-25 10:19:27 : epoch 0001004c : gpfs-proto.domain : ganesha.nfsd-123297[work-39] nfs_req_creds :DISP :INFO :Attempt to fetch managed_gids failed Hope this helps, Andy Kurth / NC State University On Thu, Jan 24, 2019 at 9:36 AM Billich Heinrich Rainer (PSI) < heiner.billich at psi.ch> wrote: > Hello, > > > > a local account on a nfs client couldn?t write to a ganesha nfs export > even with directory permissions 777. The solution was to create the account > on the ganesha servers, too. > > > > Please can you confirm that this is the intended behaviour? is there an > option to change this and to map unknown accounts to nobody instead? We > often have embedded Linux appliances or similar as nfs clients which need > to place some data on the nfs exports using uid/gid of local accounts. > > > > We manage gids on the server side and allow NFS v3 client access only. > > > > I crosspost this to ganesha support and to the gpfsug mailing list. > > > > Thank you, > > > > Heiner Billich > > > > ganesha version: 2.5.3-ibm028.00.el7.x86_64 > > > > the ganesha config > > > > CacheInode > > { > > fd_hwmark_percent=60; > > fd_lwmark_percent=20; > > fd_limit_percent=90; > > lru_run_interval=90; > > entries_hwmark=1500000; > > } > > NFS_Core_Param > > { > > clustered=TRUE; > > rpc_max_connections=10000; > > heartbeat_freq=0; > > mnt_port=33247; > > nb_worker=256; > > nfs_port=2049; > > nfs_protocols=3,4; > > nlm_port=33245; > > rquota_port=33246; > > rquota_port=33246; > > short_file_handle=FALSE; > > mount_path_pseudo=true; > > } > > GPFS > > { > > fsal_grace=FALSE; > > fsal_trace=TRUE; > > } > > NFSv4 > > { > > delegations=FALSE; > > domainname=virtual1.com; > > grace_period=60; > > lease_lifetime=60; > > } > > Export_Defaults > > { > > access_type=none; > > anonymous_gid=-2; > > anonymous_uid=-2; > > manage_gids=TRUE; > > nfs_commit=FALSE; > > privilegedport=FALSE; > > protocols=3,4; > > sectype=sys; > > squash=root_squash; > > transports=TCP; > > } > > > > one export > > > > # === START /**** id=206 nclients=3 === > > EXPORT { > > Attr_Expiration_Time=60; > > Delegations=none; > > Export_id=206; > > Filesystem_id=42.206; > > MaxOffsetRead=18446744073709551615; > > MaxOffsetWrite=18446744073709551615; > > MaxRead=1048576; > > MaxWrite=1048576; > > Path="/****"; > > PrefRead=1048576; > > PrefReaddir=1048576; > > PrefWrite=1048576; > > Pseudo="/****"; > > Tag="****"; > > UseCookieVerifier=false; > > FSAL { > > Name=GPFS; > > } > > CLIENT { > > # === ****/X12SA === > > Access_Type=RW; > > Anonymous_gid=-2; > > Anonymous_uid=-2; > > Clients=X.Y.A.B/24; > > Delegations=none; > > Manage_Gids=TRUE; > > NFS_Commit=FALSE; > > PrivilegedPort=FALSE; > > Protocols=3; > > SecType=SYS; > > Squash=Root; > > Transports=TCP; > > } > > ?. > > -- > > Paul Scherrer Institut > > Heiner Billich > > System Engineer Scientific Computing > > Science IT / High Performance Computing > > WHGA/106 > > Forschungsstrasse 111 > > 5232 Villigen PSI > > Switzerland > > > > Phone +41 56 310 36 02 > > heiner.billich at psi.ch > > https://www.psi.ch > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- *Andy Kurth* Research Storage Specialist NC State University Office of Information Technology P: 919-513-4090 311A Hillsborough Building Campus Box 7109 Raleigh, NC 27695 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Fri Jan 25 18:07:06 2019 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Fri, 25 Jan 2019 18:07:06 +0000 Subject: [gpfsug-discuss] FW: 'Flash (Alert): IBM Spectrum Scale (GPFS) V4.1.1.0 through 5.0.1.1: a read from or write to a DMAPI-migrated file may result in undetected data corruption or... Message-ID: [cid:forums.png] gpfs at us.ibm.com created a topic named Flash (Alert): IBM Spectrum Scale (GPFS) V4.1.1.0 through 5.0.1.1: a read from or write to a DMAPI-migrated file may result in undetected data corruption or a recall failure in the General Parallel File System - Announce (GPFS - Announce) forum. Abstract IBM has identified a problem in IBM Spectrum Scale V4.1.1.0 through 5.0.1.1, in which under some conditions reading a DMAPI-migrated file may return zeroes instead of the actual data. Further, a DMAPI-migrate operation or writing to a DMAPI-migrated file may cause the size of the stub file to be updated incorrectly, which may cause a mismatch between the file size recorded in the stub file and in the migrated object. This may result in failure of a manual or transparent recall, when triggered by a subsequent read from or write to the file. See the complete bulletin at: http://www.ibm.com/support/docview.wss?uid=ibm10741243 Open this item Posting Date: Friday, January 25, 2019 at 11:31:20 AM EST To unsubscribe or change settings, please go to your developerWorks community Settings. This is a notification sent from developerWorks community. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 289 bytes Desc: image001.png URL: From S.J.Thompson at bham.ac.uk Fri Jan 25 18:28:27 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 25 Jan 2019 18:28:27 +0000 Subject: [gpfsug-discuss] does ganesha deny access for unknown UIDs? In-Reply-To: References: <44051794-8F45-4725-92E0-09729474E7A1@psi.ch>, Message-ID: Note there are other limitations introduced by setting manage_gids. Whilst you get round the 16 group limit, instead ACLs are not properly interpreted to provide user access when an ACL is in place. In a PMR were told the only was around this would be to user sec_krb. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Andy Kurth [andy_kurth at ncsu.edu] Sent: 25 January 2019 16:08 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] does ganesha deny access for unknown UIDs? I believe this is occurring because of the manage_gids=TRUE setting. The purpose of this setting is to overcome the AUTH_SYS 16 group limit. If true, Ganesha takes the UID and resolves all of the GIDs on the server. If false, the GIDs sent by the client are used. I ran a quick test by creating a local user on the client and exporting 2 shares with 777 permissions, one with manage_gids=TRUE and one with FALSE. The user could view the share and create files with manage_gids=FALSE. ganesha.log showed that it tried and failed to resolve the UID to a name, but allowed the operation nonetheless: 2019-01-25 10:19:03 : epoch 0001004c : gpfs-proto.domain : ganesha.nfsd-123297[work-30] xdr_encode_nfs4_princ :ID MAPPER :INFO :nfs4_uid_to_name failed with code -2. 2019-01-25 10:19:03 : epoch 0001004c : gpfs-proto.domain : ganesha.nfsd-123297[work-30] xdr_encode_nfs4_princ :ID MAPPER :INFO :Lookup for 779 failed, using numeric owner With manage_gids=TRUE, the client received permission denied and ganesha.log showed the GID query failing: 2019-01-25 10:19:27 : epoch 0001004c : gpfs-proto.domain : ganesha.nfsd-123297[work-39] uid2grp_allocate_by_uid :ID MAPPER :INFO :No matching password record found for uid 779 2019-01-25 10:19:27 : epoch 0001004c : gpfs-proto.domain : ganesha.nfsd-123297[work-39] nfs_req_creds :DISP :INFO :Attempt to fetch managed_gids failed Hope this helps, Andy Kurth / NC State University On Thu, Jan 24, 2019 at 9:36 AM Billich Heinrich Rainer (PSI) > wrote: Hello, a local account on a nfs client couldn?t write to a ganesha nfs export even with directory permissions 777. The solution was to create the account on the ganesha servers, too. Please can you confirm that this is the intended behaviour? is there an option to change this and to map unknown accounts to nobody instead? We often have embedded Linux appliances or similar as nfs clients which need to place some data on the nfs exports using uid/gid of local accounts. We manage gids on the server side and allow NFS v3 client access only. I crosspost this to ganesha support and to the gpfsug mailing list. Thank you, Heiner Billich ganesha version: 2.5.3-ibm028.00.el7.x86_64 the ganesha config CacheInode { fd_hwmark_percent=60; fd_lwmark_percent=20; fd_limit_percent=90; lru_run_interval=90; entries_hwmark=1500000; } NFS_Core_Param { clustered=TRUE; rpc_max_connections=10000; heartbeat_freq=0; mnt_port=33247; nb_worker=256; nfs_port=2049; nfs_protocols=3,4; nlm_port=33245; rquota_port=33246; rquota_port=33246; short_file_handle=FALSE; mount_path_pseudo=true; } GPFS { fsal_grace=FALSE; fsal_trace=TRUE; } NFSv4 { delegations=FALSE; domainname=virtual1.com; grace_period=60; lease_lifetime=60; } Export_Defaults { access_type=none; anonymous_gid=-2; anonymous_uid=-2; manage_gids=TRUE; nfs_commit=FALSE; privilegedport=FALSE; protocols=3,4; sectype=sys; squash=root_squash; transports=TCP; } one export # === START /**** id=206 nclients=3 === EXPORT { Attr_Expiration_Time=60; Delegations=none; Export_id=206; Filesystem_id=42.206; MaxOffsetRead=18446744073709551615; MaxOffsetWrite=18446744073709551615; MaxRead=1048576; MaxWrite=1048576; Path="/****"; PrefRead=1048576; PrefReaddir=1048576; PrefWrite=1048576; Pseudo="/****"; Tag="****"; UseCookieVerifier=false; FSAL { Name=GPFS; } CLIENT { # === ****/X12SA === Access_Type=RW; Anonymous_gid=-2; Anonymous_uid=-2; Clients=X.Y.A.B/24; Delegations=none; Manage_Gids=TRUE; NFS_Commit=FALSE; PrivilegedPort=FALSE; Protocols=3; SecType=SYS; Squash=Root; Transports=TCP; } ?. -- Paul Scherrer Institut Heiner Billich System Engineer Scientific Computing Science IT / High Performance Computing WHGA/106 Forschungsstrasse 111 5232 Villigen PSI Switzerland Phone +41 56 310 36 02 heiner.billich at psi.ch https://www.psi.ch _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Andy Kurth Research Storage Specialist NC State University Office of Information Technology P: 919-513-4090 311A Hillsborough Building Campus Box 7109 Raleigh, NC 27695 From mnaineni at in.ibm.com Fri Jan 25 19:38:27 2019 From: mnaineni at in.ibm.com (Malahal R Naineni) Date: Fri, 25 Jan 2019 19:38:27 +0000 Subject: [gpfsug-discuss] does ganesha deny access for unknown UIDs? In-Reply-To: References: , <44051794-8F45-4725-92E0-09729474E7A1@psi.ch> Message-ID: An HTML attachment was scrubbed... URL: From chris.schlipalius at pawsey.org.au Sat Jan 26 01:32:59 2019 From: chris.schlipalius at pawsey.org.au (Chris Schlipalius) Date: Sat, 26 Jan 2019 09:32:59 +0800 Subject: [gpfsug-discuss] Announcing 2019 March 11th Singapore Spectrum Scale User Group event - call for user case speakers Message-ID: Hello, This is the announcement for the Spectrum Scale Usergroup Singapore on Monday 11th March 2019, Suntec Convention and Exhibition Centre, Singapore. This event is being held in conjunction with SCA19 https://sc-asia.org/ All current Singapore Spectrum Scale User Group event details can be found here: http://bit.ly/2FRur9d We are calling for user case speakers please ? let Ulf, Xiang or myself know if you are available to speak at this Usergroup. Feel free to circulate this event link to all who may need it. Please reserve your tickets now as tickets for places will close soon. There are some great speakers and topics, for details please see the agenda on Eventbrite. We are looking forwards to a great Usergroup in a fabulous venue. Thanks again to NSCC and IBM for helping to arrange the venue and event booking. Regards, Chris Schlipalius IBM Champion 2019 Team Lead, Storage Infrastructure, Data & Visualisation, The Pawsey Supercomputing Centre (CSIRO) 1 Bryce Avenue Kensington WA 6151 Australia Tel +61 8 6436 8815 Email chris.schlipalius at pawsey.org.au Web www.pawsey.org.au From Renar.Grunenberg at huk-coburg.de Mon Jan 28 08:36:45 2019 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Mon, 28 Jan 2019 08:36:45 +0000 Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays In-Reply-To: References: Message-ID: <528da43a668745f38d68c0a82ecb53a3@SMXRF105.msg.hukrf.de> Hallo Truong Vu, unfortunality the results are the same, the cmd-responce are not what we want. Ok, we want to analyze something with the trace facility and came to following link in the knowledge center: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/com.ibm.spectrum.scale.v5r02.doc/bl1ins_instracsupp.htm The docu mentioned that we must copy to windows files, tracefmt.exe and tracelog.exe, but the first one are only available in the DDK-Version 7.1 (W2K3), not in the WDK Version 8 or 10. We use W2K12. Can you clarify where I can find the mentioned files. Regards Renar. Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Truong Vu Gesendet: Donnerstag, 24. Januar 2019 19:18 An: gpfsug-discuss at spectrumscale.org Betreff: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Hi Renar, Let's see if it is really the /bin/rm is the problem here. Can you run the command again without cleanup the temp files as follow: DEBUG=1 keepTempFiles=1 mmgetstate -a Thanks, Tru. [Inactive hide details for gpfsug-discuss-request---01/23/2019 07:46:30 AM---Send gpfsug-discuss mailing list submissions to gp]gpfsug-discuss-request---01/23/2019 07:46:30 AM---Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 01/23/2019 07:46 AM Subject: gpfsug-discuss Digest, Vol 84, Issue 32 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: Spectrum Scale Cygwin cmd delays (Grunenberg, Renar) ---------------------------------------------------------------------- Message: 1 Date: Wed, 23 Jan 2019 12:45:39 +0000 From: "Grunenberg, Renar" > To: 'gpfsug main discussion list' > Subject: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Message-ID: <349cb338583a4c1d996677837fc65b6e at SMXRF105.msg.hukrf.de> Content-Type: text/plain; charset="utf-8" Hallo All, as a point to the problem, it seems to be that all the delayes are happening here DEBUG=1 mmgetstate ?a ??.. /bin/rm -f /var/mmfs/gen/mmsdrfs.1256 /var/mmfs/tmp/allClusterNodes.mmgetstate.1256 /var/mmfs/tmp/allQuorumNodes.mmgetstate.1256 /var/mmfs/tmp/allNonQuorumNodes.mmgetstate.1256 /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.pub /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.priv /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.cert /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.keystore /var/mmfs/tmp/nodefile.mmgetstate.1256 /var/mmfs/tmp/diskfile.mmgetstate.1256 /var/mmfs/tmp/diskNamesFile.mmgetstate.1256 Any points to this it will be fixed in the near future are welcome. Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: Grunenberg, Renar Gesendet: Dienstag, 22. Januar 2019 18:10 An: 'gpfsug main discussion list' > Betreff: AW: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Hallo Roger, first thanks fort he tip. But we decided to separate the linux-io-Cluster from the Windows client only cluster, because of security requirements and ssh management requirements. We can use at this point, local named admins on Windows and use on Linux a Deamon and an separated Admin-interface Network for pwless root ssh. Your Hint seems to be CCR related or is this a Cygwin problem. @Spectrum Scale Team: Point1: IP V6 can?t disabled because of applications that want to use this. But the mmcmi cmd are give us already the right ipv4 adresses. Point2. There are no DNS-Issues Point3: We must check these. Any recommendations to Rogers statements? Regards Renar Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Roger Moye Gesendet: Dienstag, 22. Januar 2019 16:43 An: gpfsug main discussion list > Betreff: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays We experienced the same issue and were advised not to use Windows for quorum nodes. We moved our Windows nodes into the storage cluster which was entirely Linux and that solved it. If this is not an option, perhaps adding some Linux nodes to your remote cluster as quorum nodes would help. -Roger From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of IBM Spectrum Scale Sent: Monday, January 21, 2019 5:35 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Hello Renar, A few things to try: 1. Make sure IPv6 is disabled. On each Windows node, run "mmcmi host ", with being itself and each and every node in the cluster. Make sure mmcmi prints valid IPv4 address. 1. To eliminate DNS issues, try adding IPv4 entries for each cluster node in "c:\windows\system32\drivers\etc\hosts". 1. If any anti-virus is active, disable realtime scanning on c:\cygwin64 (wherever you installed cygwin 64-bit). You can also try debugging a script, say: (from GPFS ksh): DEBUG=1 mmlscluster, and see what takes time. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Grunenberg, Renar" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 01/21/2019 08:01 AM Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello All, We test spectrum scale on an windows only Client-Cluster (remote mounted to a linux Cluster) but the execution of mm commands in cygwin is very slow. We have tried the following adjustments to increase the execution speed. * We have installed Cygwin Server as a service (cygserver-config). Unfortunately, this resulted in no faster execution. * Adaptation of the hosts file: 127.0.0.1localhost cygdrive wpad to prevent any DNS problems when accessing ?/cygdrive/...? * Started them as Administrator All adjustments have so far not led to any improvement. Are there any hints to enhance the cmd execution time on windows (w2k12 actual used) Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ----------------------------------------------------------------------------------- The information in this communication and any attachment is confidential and intended solely for the attention and use of the named addressee(s). All information and opinions expressed herein are subject to change without notice. This communication is not to be construed as an offer to sell or the solicitation of an offer to buy any security. Any such offer or solicitation can only be made by means of the delivery of a confidential private offering memorandum (which should be carefully reviewed for a complete description of investment strategies and risks). Any reliance one may place on the accuracy or validity of this information is at their own risk. Past performance is not necessarily indicative of the future results of an investment. All figures are estimated and unaudited unless otherwise noted. If you are not the intended recipient, or a person responsible for delivering this to the intended recipient, you are not authorized to and must not disclose, copy, distribute, o r retain this message or any part of it. In this case, please notify the sender immediately at 713-333-5440 -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 84, Issue 32 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From scale at us.ibm.com Tue Jan 29 00:20:47 2019 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Mon, 28 Jan 2019 16:20:47 -0800 Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays In-Reply-To: <528da43a668745f38d68c0a82ecb53a3@SMXRF105.msg.hukrf.de> References: <528da43a668745f38d68c0a82ecb53a3@SMXRF105.msg.hukrf.de> Message-ID: Hello Renar, I have WDK 8.1 installed and it does come with trace*.exe. Check this out: https://docs.microsoft.com/en-us/windows-hardware/drivers/devtest/tracefmt If not the WDK, did you try your SDK/VisualStudio folders as indicated in the above link? Nevertheless, I have uploaded trace*.exe here for you to download: ftp testcase.software.ibm.com. Login as anonymous and provide your email as password. cd /fromibm/aix. mget trace*.exe. This site gets scrubbed often, hence download soon before they get deleted. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Grunenberg, Renar" To: "gpfsug-discuss at spectrumscale.org" Date: 01/28/2019 12:38 AM Subject: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Sent by: gpfsug-discuss-bounces at spectrumscale.org Hallo Truong Vu, unfortunality the results are the same, the cmd-responce are not what we want. Ok, we want to analyze something with the trace facility and came to following link in the knowledge center: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/com.ibm.spectrum.scale.v5r02.doc/bl1ins_instracsupp.htm The docu mentioned that we must copy to windows files, tracefmt.exe and tracelog.exe, but the first one are only available in the DDK-Version 7.1 (W2K3), not in the WDK Version 8 or 10. We use W2K12. Can you clarify where I can find the mentioned files. Regards Renar. Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. Von: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Truong Vu Gesendet: Donnerstag, 24. Januar 2019 19:18 An: gpfsug-discuss at spectrumscale.org Betreff: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Hi Renar, Let's see if it is really the /bin/rm is the problem here. Can you run the command again without cleanup the temp files as follow: DEBUG=1 keepTempFiles=1 mmgetstate -a Thanks, Tru. gpfsug-discuss-request---01/23/2019 07:46:30 AM---Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 01/23/2019 07:46 AM Subject: gpfsug-discuss Digest, Vol 84, Issue 32 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: Spectrum Scale Cygwin cmd delays (Grunenberg, Renar) ---------------------------------------------------------------------- Message: 1 Date: Wed, 23 Jan 2019 12:45:39 +0000 From: "Grunenberg, Renar" To: 'gpfsug main discussion list' Subject: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Message-ID: <349cb338583a4c1d996677837fc65b6e at SMXRF105.msg.hukrf.de> Content-Type: text/plain; charset="utf-8" Hallo All, as a point to the problem, it seems to be that all the delayes are happening here DEBUG=1 mmgetstate ?a ??.. /bin/rm -f /var/mmfs/gen/mmsdrfs.1256 /var/mmfs/tmp/allClusterNodes.mmgetstate.1256 /var/mmfs/tmp/allQuorumNodes.mmgetstate.1256 /var/mmfs/tmp/allNonQuorumNodes.mmgetstate.1256 /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.pub /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.priv /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.cert /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.keystore /var/mmfs/tmp/nodefile.mmgetstate.1256 /var/mmfs/tmp/diskfile.mmgetstate.1256 /var/mmfs/tmp/diskNamesFile.mmgetstate.1256 Any points to this it will be fixed in the near future are welcome. Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: Grunenberg, Renar Gesendet: Dienstag, 22. Januar 2019 18:10 An: 'gpfsug main discussion list' Betreff: AW: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Hallo Roger, first thanks fort he tip. But we decided to separate the linux-io-Cluster from the Windows client only cluster, because of security requirements and ssh management requirements. We can use at this point, local named admins on Windows and use on Linux a Deamon and an separated Admin-interface Network for pwless root ssh. Your Hint seems to be CCR related or is this a Cygwin problem. @Spectrum Scale Team: Point1: IP V6 can?t disabled because of applications that want to use this. But the mmcmi cmd are give us already the right ipv4 adresses. Point2. There are no DNS-Issues Point3: We must check these. Any recommendations to Rogers statements? Regards Renar Von: gpfsug-discuss-bounces at spectrumscale.org< mailto:gpfsug-discuss-bounces at spectrumscale.org> [ mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Roger Moye Gesendet: Dienstag, 22. Januar 2019 16:43 An: gpfsug main discussion list > Betreff: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays We experienced the same issue and were advised not to use Windows for quorum nodes. We moved our Windows nodes into the storage cluster which was entirely Linux and that solved it. If this is not an option, perhaps adding some Linux nodes to your remote cluster as quorum nodes would help. -Roger From: gpfsug-discuss-bounces at spectrumscale.org< mailto:gpfsug-discuss-bounces at spectrumscale.org> [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of IBM Spectrum Scale Sent: Monday, January 21, 2019 5:35 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Hello Renar, A few things to try: 1. Make sure IPv6 is disabled. On each Windows node, run "mmcmi host ", with being itself and each and every node in the cluster. Make sure mmcmi prints valid IPv4 address. 1. To eliminate DNS issues, try adding IPv4 entries for each cluster node in "c:\windows\system32\drivers\etc\hosts". 1. If any anti-virus is active, disable realtime scanning on c:\cygwin64 (wherever you installed cygwin 64-bit). You can also try debugging a script, say: (from GPFS ksh): DEBUG=1 mmlscluster, and see what takes time. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Grunenberg, Renar" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 01/21/2019 08:01 AM Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Sent by: gpfsug-discuss-bounces at spectrumscale.org< mailto:gpfsug-discuss-bounces at spectrumscale.org> ________________________________ Hello All, We test spectrum scale on an windows only Client-Cluster (remote mounted to a linux Cluster) but the execution of mm commands in cygwin is very slow. We have tried the following adjustments to increase the execution speed. * We have installed Cygwin Server as a service (cygserver-config). Unfortunately, this resulted in no faster execution. * Adaptation of the hosts file: 127.0.0.1localhost cygdrive wpad to prevent any DNS problems when accessing ?/cygdrive/...? * Started them as Administrator All adjustments have so far not led to any improvement. Are there any hints to enhance the cmd execution time on windows (w2k12 actual used) Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ----------------------------------------------------------------------------------- The information in this communication and any attachment is confidential and intended solely for the attention and use of the named addressee(s). All information and opinions expressed herein are subject to change without notice. This communication is not to be construed as an offer to sell or the solicitation of an offer to buy any security. Any such offer or solicitation can only be made by means of the delivery of a confidential private offering memorandum (which should be carefully reviewed for a complete description of investment strategies and risks). Any reliance one may place on the accuracy or validity of this information is at their own risk. Past performance is not necessarily indicative of the future results of an investment. All figures are estimated and unaudited unless otherwise noted. If you are not the intended recipient, or a person responsible for delivering this to the intended recipient, you are not authorized to and must not disclose, copy, distribute, o r retain this message or any part of it. In this case, please notify the sender immediately at 713-333-5440 -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20190123/eff7ad74/attachment.html > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 84, Issue 32 ********************************************** _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=_PEp_I-F3uzCglEj5raDY1xo2-W6myUCIX1ysChh0lo&s=k9JU3wc7KoJj1VWVVSjjAekQcIEfeJazMkT3BBME-SY&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 105 bytes Desc: not available URL: From cblack at nygenome.org Tue Jan 29 17:23:49 2019 From: cblack at nygenome.org (Christopher Black) Date: Tue, 29 Jan 2019 17:23:49 +0000 Subject: [gpfsug-discuss] Querying size of snapshots Message-ID: <51DD7A04-B0EE-4FDE-8FF6-E888A2C51E23@nygenome.org> We have some large filesets (PB+) and filesystems where I would like to monitor delete rates and estimate how much space we will get back as snapshots expire. We only keep 3-4 daily snapshots on this filesystem due to churn. I?ve tried to query the sizes of snapshots using the following command: mmlssnapshot fsname -d --block-size 1T However, this has run for over an hour without producing any results. Metadata is all on flash and I?m not sure why this is taking so long. Does anyone have any insight on this or alternate methods for getting estimates of snapshot sizes? Best, Chris PS I am aware of the warning in docs about the -d option. ________________________________ This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Tue Jan 29 18:24:17 2019 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 29 Jan 2019 15:24:17 -0300 Subject: [gpfsug-discuss] Querying size of snapshots In-Reply-To: <51DD7A04-B0EE-4FDE-8FF6-E888A2C51E23@nygenome.org> References: <51DD7A04-B0EE-4FDE-8FF6-E888A2C51E23@nygenome.org> Message-ID: 1. First off, let's RTFM ... -d Displays the amount of storage that is used by the snapshot. This operation requires an amount of time that is proportional to the size of the file system; therefore, it can take several minutes or even hours on a large and heavily-loaded file system. This optional parameter can impact overall system performance. Avoid running the mmlssnapshot command with this parameter frequently or during periods of high file system activity. SOOOO.. there's that. 2. Next you may ask, HOW is that? Snapshots are maintained with a "COW" strategy -- They are created quickly, essentially just making a record that the snapshot was created and at such and such time -- when the snapshot is the same as the "live" filesystem... Then over time, each change to a block of data in live system requires that a copy is made of the old data block and that is associated with the most recently created snapshot.... SO, as more and more changes are made to different blocks over time the snapshot becomes bigger and bigger. How big? Well it seems the current implementation does not keep a "simple counter" of the number of blocks -- but rather, a list of the blocks that were COW'ed.... So when you come and ask "How big"... GPFS has to go traverse the file sytem metadata and count those COW'ed blocks.... 3. So why not keep a counter? Well, it's likely not so simple. For starters GPFS is typically running concurrently on several or many nodes... And probably was not deemed worth the effort ..... IF a convincing case could be made, I'd bet there is a way... to at least keep approximate numbers, log records, exact updates periodically, etc, etc -- similar to the way space allocation and accounting is done for the live file system... -------------- next part -------------- An HTML attachment was scrubbed... URL: From cblack at nygenome.org Tue Jan 29 18:43:24 2019 From: cblack at nygenome.org (Christopher Black) Date: Tue, 29 Jan 2019 18:43:24 +0000 Subject: [gpfsug-discuss] Querying size of snapshots In-Reply-To: References: <51DD7A04-B0EE-4FDE-8FF6-E888A2C51E23@nygenome.org> Message-ID: <369E2850-0335-468D-9F86-B18D4576A04C@nygenome.org> Thanks for the quick and detailed reply! I had read the manual and was aware of the warnings about -d (mentioned in my PS). On systems with high churn (lots of temporary files, lots of big and small deletes along with many new files), I?ve previously used estimates of snapshot size as a useful signal on whether we can expect to see an increase in available space over the next few days as snapshots expire. I?ve used this technique on a few different more mainstream storage systems, but never on gpfs. I?d find it useful to have a similar way to monitor ?space to be freed pending snapshot deletes? on gpfs. It sounds like there is not an existing solution for this so it would be a request for enhancement. I?m not sure how much overhead there would be keeping a running counter for blocks changed since snapshot creation or if that would completely fall apart on large systems or systems with many snapshots. If that is a consideration even having only an estimate for the oldest snapshot would be useful, but I realize that can depend on all the other later snapshots as well. Perhaps an overall ?size of all snapshots? would be easier to manage and would still be useful to us. I don?t need this number to be 100% accurate, but a low or floor estimate would be very useful. Is anyone else interested in this? Do other people have other ways to estimate how much space they will get back as snapshots expire? Is there a more efficient way of making such an estimate available to admins other than running an mmlssnapshot -d every night and recording the output? Thanks all! Chris From: on behalf of Marc A Kaplan Reply-To: gpfsug main discussion list Date: Tuesday, January 29, 2019 at 1:24 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Querying size of snapshots 1. First off, let's RTFM ... -d Displays the amount of storage that is used by the snapshot. This operation requires an amount of time that is proportional to the size of the file system; therefore, it can take several minutes or even hours on a large and heavily-loaded file system. This optional parameter can impact overall system performance. Avoid running the mmlssnapshot command with this parameter frequently or during periods of high file system activity. SOOOO.. there's that. 2. Next you may ask, HOW is that? Snapshots are maintained with a "COW" strategy -- They are created quickly, essentially just making a record that the snapshot was created and at such and such time -- when the snapshot is the same as the "live" filesystem... Then over time, each change to a block of data in live system requires that a copy is made of the old data block and that is associated with the most recently created snapshot.... SO, as more and more changes are made to different blocks over time the snapshot becomes bigger and bigger. How big? Well it seems the current implementation does not keep a "simple counter" of the number of blocks -- but rather, a list of the blocks that were COW'ed.... So when you come and ask "How big"... GPFS has to go traverse the file sytem metadata and count those COW'ed blocks.... 3. So why not keep a counter? Well, it's likely not so simple. For starters GPFS is typically running concurrently on several or many nodes... And probably was not deemed worth the effort ..... IF a convincing case could be made, I'd bet there is a way... to at least keep approximate numbers, log records, exact updates periodically, etc, etc -- similar to the way space allocation and accounting is done for the live file system... ________________________________ This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Tue Jan 29 19:19:12 2019 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Tue, 29 Jan 2019 20:19:12 +0100 Subject: [gpfsug-discuss] Querying size of snapshots In-Reply-To: <369E2850-0335-468D-9F86-B18D4576A04C@nygenome.org> References: <51DD7A04-B0EE-4FDE-8FF6-E888A2C51E23@nygenome.org> <369E2850-0335-468D-9F86-B18D4576A04C@nygenome.org> Message-ID: You could put snapshot data in a separate storage pool. Then it should be visible how much space it occupies, but it?s a bit hard to see how this will be usable/manageable.. -jf tir. 29. jan. 2019 kl. 20:08 skrev Christopher Black : > Thanks for the quick and detailed reply! I had read the manual and was > aware of the warnings about -d (mentioned in my PS). > > On systems with high churn (lots of temporary files, lots of big and small > deletes along with many new files), I?ve previously used estimates of > snapshot size as a useful signal on whether we can expect to see an > increase in available space over the next few days as snapshots expire. > I?ve used this technique on a few different more mainstream storage > systems, but never on gpfs. > > I?d find it useful to have a similar way to monitor ?space to be freed > pending snapshot deletes? on gpfs. It sounds like there is not an existing > solution for this so it would be a request for enhancement. > > I?m not sure how much overhead there would be keeping a running counter > for blocks changed since snapshot creation or if that would completely fall > apart on large systems or systems with many snapshots. If that is a > consideration even having only an estimate for the oldest snapshot would be > useful, but I realize that can depend on all the other later snapshots as > well. Perhaps an overall ?size of all snapshots? would be easier to manage > and would still be useful to us. > > I don?t need this number to be 100% accurate, but a low or floor estimate > would be very useful. > > > > Is anyone else interested in this? Do other people have other ways to > estimate how much space they will get back as snapshots expire? Is there a > more efficient way of making such an estimate available to admins other > than running an mmlssnapshot -d every night and recording the output? > > > > Thanks all! > > Chris > > > > *From: * on behalf of Marc A > Kaplan > *Reply-To: *gpfsug main discussion list > *Date: *Tuesday, January 29, 2019 at 1:24 PM > *To: *gpfsug main discussion list > *Subject: *Re: [gpfsug-discuss] Querying size of snapshots > > > > 1. First off, let's RTFM ... > > *-d *Displays the amount of storage that is used by the snapshot. > This operation requires an amount of time that is proportional to the size > of the file system; therefore, > it can take several minutes or even hours on a large and heavily-loaded > file system. > This optional parameter can impact overall system performance. Avoid > running the * mmlssnapshot* > command with this parameter frequently or during periods of high file > system activity. > > SOOOO.. there's that. > > 2. Next you may ask, HOW is that? > > Snapshots are maintained with a "COW" strategy -- They are created > quickly, essentially just making a record that the snapshot was created and > at such and such time -- when the snapshot is the same as the "live" > filesystem... > > Then over time, each change to a block of data in live system requires > that a copy is made of the old data block and that is associated with the > most recently created snapshot.... SO, as more and more changes are made > to different blocks over time the snapshot becomes bigger and bigger. How > big? Well it seems the current implementation does not keep a "simple > counter" of the number of blocks -- but rather, a list of the blocks that > were COW'ed.... So when you come and ask "How big"... GPFS has to go > traverse the file sytem metadata and count those COW'ed blocks.... > > 3. So why not keep a counter? Well, it's likely not so simple. For > starters GPFS is typically running concurrently on several or many > nodes... And probably was not deemed worth the effort ..... IF a > convincing case could be made, I'd bet there is a way... to at least keep > approximate numbers, log records, exact updates periodically, etc, etc -- > similar to the way space allocation and accounting is done for the live > file system... > > > ------------------------------ > This message is for the recipient?s use only, and may contain > confidential, privileged or protected information. Any unauthorized use or > dissemination of this communication is prohibited. If you received this > message in error, please immediately notify the sender and destroy all > copies of this message. The recipient should check this email and any > attachments for the presence of viruses, as we accept no liability for any > damage caused by any virus transmitted by this email. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Tue Jan 29 21:37:08 2019 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Tue, 29 Jan 2019 22:37:08 +0100 Subject: [gpfsug-discuss] Querying size of snapshots In-Reply-To: References: <51DD7A04-B0EE-4FDE-8FF6-E888A2C51E23@nygenome.org><369E2850-0335-468D-9F86-B18D4576A04C@nygenome.org> Message-ID: An HTML attachment was scrubbed... URL: From alvise.dorigo at psi.ch Wed Jan 30 13:16:22 2019 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Wed, 30 Jan 2019 13:16:22 +0000 Subject: [gpfsug-discuss] Unbalanced pdisk free space Message-ID: <83A6EEB0EC738F459A39439733AE8045267DF159@MBX114.d.ethz.ch> Hello, I've a Lenovo Spectrum Scale system DSS-G220 (software dss-g-2.0a) composed of 2x x3560 M5 IO server nodes 1x x3550 M5 client/support node 2x disk enclosures D3284 GPFS/GNR 4.2.3-7 Can anybody tell me if it is normal that all the pdisks of both my recovery groups, residing on the same physical enclosure have free space equal to (more or less) 1/3 of the free space of the pdisks residing on the other physical enclosure (see attached text files for the command line output) ? I guess when the least free disks are fully occupied (while the others are still partially free) write performance will drop by a factor of two. Correct ? Is there a way (considering that the system is in production) to fix (rebalance) this free space among all pdisk of both enclosures ? Should I open a PMR to IBM ? Many thanks, Alvise -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: rg1 Type: application/octet-stream Size: 13340 bytes Desc: rg1 URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: rg2 Type: application/octet-stream Size: 13340 bytes Desc: rg2 URL: From abeattie at au1.ibm.com Wed Jan 30 14:53:47 2019 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Wed, 30 Jan 2019 14:53:47 +0000 Subject: [gpfsug-discuss] Unbalanced pdisk free space In-Reply-To: <83A6EEB0EC738F459A39439733AE8045267DF159@MBX114.d.ethz.ch> References: <83A6EEB0EC738F459A39439733AE8045267DF159@MBX114.d.ethz.ch> Message-ID: An HTML attachment was scrubbed... URL: From scale at us.ibm.com Wed Jan 30 20:25:20 2019 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Wed, 30 Jan 2019 15:25:20 -0500 Subject: [gpfsug-discuss] Unbalanced pdisk free space In-Reply-To: <83A6EEB0EC738F459A39439733AE8045267DF159@MBX114.d.ethz.ch> References: <83A6EEB0EC738F459A39439733AE8045267DF159@MBX114.d.ethz.ch> Message-ID: Alvise, Could you send us the output of the following commands from both server nodes. mmfsadm dump nspdclient > /tmp/dump_nspdclient. mmfsadm dump pdisk > /tmp/dump_pdisk. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Dorigo Alvise (PSI)" To: "gpfsug-discuss at spectrumscale.org" Date: 01/30/2019 08:24 AM Subject: [gpfsug-discuss] Unbalanced pdisk free space Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello, I've a Lenovo Spectrum Scale system DSS-G220 (software dss-g-2.0a) composed of 2x x3560 M5 IO server nodes 1x x3550 M5 client/support node 2x disk enclosures D3284 GPFS/GNR 4.2.3-7 Can anybody tell me if it is normal that all the pdisks of both my recovery groups, residing on the same physical enclosure have free space equal to (more or less) 1/3 of the free space of the pdisks residing on the other physical enclosure (see attached text files for the command line output) ? I guess when the least free disks are fully occupied (while the others are still partially free) write performance will drop by a factor of two. Correct ? Is there a way (considering that the system is in production) to fix (rebalance) this free space among all pdisk of both enclosures ? Should I open a PMR to IBM ? Many thanks, Alvise [attachment "rg1" deleted by Brian Herr/Poughkeepsie/IBM] [attachment "rg2" deleted by Brian Herr/Poughkeepsie/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=QDZ-afehEgpYi3JGRd8q6rHgo4rb8gVu_VKQwg4MwEs&s=5bEFHRU7zk-nRK_d20vJBngQOOkSLWT1vvtcDNKD584&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Jan 30 20:51:49 2019 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 30 Jan 2019 20:51:49 +0000 Subject: [gpfsug-discuss] =?utf-8?q?Node_=E2=80=98crash_and_restart?= =?utf-8?q?=E2=80=99_event_using_GPFS_callback=3F?= Message-ID: <7F7F8D51-256D-4EA7-B03F-11CF7B752AFA@nuance.com> Anyone crafted a good way to detect a node ?crash and restart? event using GPFS callbacks? I?m thinking ?preShutdown? but I?m not sure if that?s the best. What I?m really looking for is did the node shutdown (abort) and create a dump in /tmp/mmfs Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From Paul.Sanchez at deshaw.com Wed Jan 30 21:02:26 2019 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Wed, 30 Jan 2019 21:02:26 +0000 Subject: [gpfsug-discuss] =?utf-8?q?Node_=E2=80=98crash_and_restart?= =?utf-8?q?=E2=80=99_event_using_GPFS_callback=3F?= In-Reply-To: <7F7F8D51-256D-4EA7-B03F-11CF7B752AFA@nuance.com> References: <7F7F8D51-256D-4EA7-B03F-11CF7B752AFA@nuance.com> Message-ID: <9892d4f382cb4fbe80dbde9f87724632@mbxtoa1.winmail.deshaw.com> There are some cases which I don?t believe can be caught with callbacks (e.g. DMS = Dead Man Switch). But you could possibly use preStartup to check the host uptime to make an assumption if GPFS was restarted long after the host booted. You could also peek in /tmp/mmfs and only report if you find something there. That said, the docs say that preStartup fires after the node joins the cluster. So if that means once the node is ?active? then you might miss out on nodes stuck in ?arbitrating? for a while due to a waiter problem. We run a script with cron which monitors the myriad things which can go wrong and attempt to right those which are safe to fix, and raise alerts appropriately. Something like that, outside the reach of GPFS, is often a good choice if you don?t need to know something the moment it happens. Thx Paul From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Oesterlin, Robert Sent: Wednesday, January 30, 2019 3:52 PM To: gpfsug main discussion list Subject: [gpfsug-discuss] Node ?crash and restart? event using GPFS callback? Anyone crafted a good way to detect a node ?crash and restart? event using GPFS callbacks? I?m thinking ?preShutdown? but I?m not sure if that?s the best. What I?m really looking for is did the node shutdown (abort) and create a dump in /tmp/mmfs Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Wed Jan 30 21:16:51 2019 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Wed, 30 Jan 2019 18:16:51 -0300 Subject: [gpfsug-discuss] =?utf-8?q?Node_=E2=80=98crash_and_restart?= =?utf-8?q?=E2=80=99_event_using_GPFS_callback=3F?= In-Reply-To: <7F7F8D51-256D-4EA7-B03F-11CF7B752AFA@nuance.com> References: <7F7F8D51-256D-4EA7-B03F-11CF7B752AFA@nuance.com> Message-ID: We have (pre)shutdown and pre(startup) ... Trap and record both... If you see a startup without a matching shutdown you know the shutdown never happened, because GPFS crashed. From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 01/30/2019 05:52 PM Subject: [gpfsug-discuss] Node ?crash and restart? event using GPFS callback? Sent by: gpfsug-discuss-bounces at spectrumscale.org Anyone crafted a good way to detect a node ?crash and restart? event using GPFS callbacks? I?m thinking ?preShutdown? but I?m not sure if that?s the best. What I?m really looking for is did the node shutdown (abort) and create a dump in /tmp/mmfs Bob Oesterlin Sr Principal Storage Engineer, Nuance _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8&m=oBQHDWo5PVKthJjmbVrQyqSrkuFZEcMQb_tXtvcKepE&s=HfF_wArTvc-i4wLfATXbwrImRT-w0mKG8mhctBJFLCI&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From Dwayne.Hart at med.mun.ca Wed Jan 30 21:52:48 2019 From: Dwayne.Hart at med.mun.ca (Dwayne.Hart at med.mun.ca) Date: Wed, 30 Jan 2019 21:52:48 +0000 Subject: [gpfsug-discuss] =?windows-1252?q?Node_=91crash_and_restart=92_ev?= =?windows-1252?q?ent_using_GPFS_callback=3F?= In-Reply-To: <9892d4f382cb4fbe80dbde9f87724632@mbxtoa1.winmail.deshaw.com> References: <7F7F8D51-256D-4EA7-B03F-11CF7B752AFA@nuance.com>, <9892d4f382cb4fbe80dbde9f87724632@mbxtoa1.winmail.deshaw.com> Message-ID: <063B3F21-8695-4454-8D1A-B1734B1AD436@med.mun.ca> Could you get away with running ?mmdiag ?stats? and inspecting the uptime information it provides? Best, Dwayne ? Dwayne Hart | Systems Administrator IV CHIA, Faculty of Medicine Memorial University of Newfoundland 300 Prince Philip Drive St. John?s, Newfoundland | A1B 3V6 Craig L Dobbin Building | 4M409 T 709 864 6631 On Jan 30, 2019, at 5:32 PM, Sanchez, Paul > wrote: There are some cases which I don?t believe can be caught with callbacks (e.g. DMS = Dead Man Switch). But you could possibly use preStartup to check the host uptime to make an assumption if GPFS was restarted long after the host booted. You could also peek in /tmp/mmfs and only report if you find something there. That said, the docs say that preStartup fires after the node joins the cluster. So if that means once the node is ?active? then you might miss out on nodes stuck in ?arbitrating? for a while due to a waiter problem. We run a script with cron which monitors the myriad things which can go wrong and attempt to right those which are safe to fix, and raise alerts appropriately. Something like that, outside the reach of GPFS, is often a good choice if you don?t need to know something the moment it happens. Thx Paul From: gpfsug-discuss-bounces at spectrumscale.org > On Behalf Of Oesterlin, Robert Sent: Wednesday, January 30, 2019 3:52 PM To: gpfsug main discussion list > Subject: [gpfsug-discuss] Node ?crash and restart? event using GPFS callback? Anyone crafted a good way to detect a node ?crash and restart? event using GPFS callbacks? I?m thinking ?preShutdown? but I?m not sure if that?s the best. What I?m really looking for is did the node shutdown (abort) and create a dump in /tmp/mmfs Bob Oesterlin Sr Principal Storage Engineer, Nuance _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu Jan 31 01:19:47 2019 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 31 Jan 2019 01:19:47 +0000 Subject: [gpfsug-discuss] =?utf-8?q?Node_=E2=80=98crash_and_restart?= =?utf-8?q?=E2=80=99_event_using_GPFS_callback=3F?= Message-ID: <554E186D-30BD-4E7D-859C-339F5DDAD442@nuance.com> Actually, I think ?preShutdown? will do it since it passes the type of shutdown ?abnormal? for a crash to the call back - I can use that to send a Slack message. mmaddcallback node-abort --event preShutdown --command /usr/local/sbin/callback-test.sh --parms "%eventName %reason" and you get either: preShutdown normal preShutdown abnormal Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Marc A Kaplan Reply-To: gpfsug main discussion list Date: Wednesday, January 30, 2019 at 3:17 PM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Node ?crash and restart? event using GPFS callback? We have (pre)shutdown and pre(startup) ... Trap and record both... If you see a startup without a matching shutdown you know the shutdown never happened, because GPFS crashed. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alastair.smith at ucl.ac.uk Wed Jan 30 14:11:08 2019 From: alastair.smith at ucl.ac.uk (Smith, Alastair) Date: Wed, 30 Jan 2019 14:11:08 +0000 Subject: [gpfsug-discuss] Job opportunity at UCL Research Data Services Message-ID: Dear List Members, We would like to draw you attention to a job opportunity at UCL for a Senior Research Data Systems Engineer. The is a technical role in the Research Data Services Group, part of UCL's large and well-established Research IT Services team. The Senior Data Systems Engineer leads the development of technical strategy for Research Data Services at UCL. The successful applicant will ensure that appropriate technologies and workflows are used to address research data management requirements across the institution, particularly those relating to data storage and access. The Research Data Services Group provides petabyte-scale data storage for active research projects, and is about to launch a long-term data repository service. Over the coming years, the Group will be building an integrated suite of services to support data management from planning to re-use, and the successful candidate will play an important role in the design and operation of these services. The post comes with a competitive salary and a central London working location. The closing date for applications it 2nd February. Further particulars and a link to the application form are available from https://tinyurl.com/ucljobs-rdse. -|-|-|-|-|-|-|-|-|-|-|-|-|- Dr Alastair Smith Senior research data systems engineer Research Data Services, RITS Information Services Division University College London 1 St Martin's- Le-Grand London EC1A 4AS -------------- next part -------------- An HTML attachment was scrubbed... URL: From alvise.dorigo at psi.ch Thu Jan 31 09:48:12 2019 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Thu, 31 Jan 2019 09:48:12 +0000 Subject: [gpfsug-discuss] Unbalanced pdisk free space In-Reply-To: References: <83A6EEB0EC738F459A39439733AE8045267DF159@MBX114.d.ethz.ch>, Message-ID: <83A6EEB0EC738F459A39439733AE8045267E32C0@MBX114.d.ethz.ch> They're attached. Thanks! Alvise ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of IBM Spectrum Scale [scale at us.ibm.com] Sent: Wednesday, January 30, 2019 9:25 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Unbalanced pdisk free space Alvise, Could you send us the output of the following commands from both server nodes. * mmfsadm dump nspdclient > /tmp/dump_nspdclient. * mmfsadm dump pdisk > /tmp/dump_pdisk. * Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Dorigo Alvise (PSI)" To: "gpfsug-discuss at spectrumscale.org" Date: 01/30/2019 08:24 AM Subject: [gpfsug-discuss] Unbalanced pdisk free space Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello, I've a Lenovo Spectrum Scale system DSS-G220 (software dss-g-2.0a) composed of 2x x3560 M5 IO server nodes 1x x3550 M5 client/support node 2x disk enclosures D3284 GPFS/GNR 4.2.3-7 Can anybody tell me if it is normal that all the pdisks of both my recovery groups, residing on the same physical enclosure have free space equal to (more or less) 1/3 of the free space of the pdisks residing on the other physical enclosure (see attached text files for the command line output) ? I guess when the least free disks are fully occupied (while the others are still partially free) write performance will drop by a factor of two. Correct ? Is there a way (considering that the system is in production) to fix (rebalance) this free space among all pdisk of both enclosures ? Should I open a PMR to IBM ? Many thanks, Alvise [attachment "rg1" deleted by Brian Herr/Poughkeepsie/IBM] [attachment "rg2" deleted by Brian Herr/Poughkeepsie/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: dump_nspdclient.sf-dssio-1 Type: application/octet-stream Size: 570473 bytes Desc: dump_nspdclient.sf-dssio-1 URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: dump_nspdclient.sf-dssio-2 Type: application/octet-stream Size: 566924 bytes Desc: dump_nspdclient.sf-dssio-2 URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: dump_pdisk.sf-dssio-1 Type: application/octet-stream Size: 682312 bytes Desc: dump_pdisk.sf-dssio-1 URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: dump_pdisk.sf-dssio-2 Type: application/octet-stream Size: 619497 bytes Desc: dump_pdisk.sf-dssio-2 URL: From heiner.billich at psi.ch Thu Jan 31 14:56:21 2019 From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI)) Date: Thu, 31 Jan 2019 14:56:21 +0000 Subject: [gpfsug-discuss] Token manager - how to monitor performance? Message-ID: <02FE0AE6-BDDC-4E10-9C41-E68EB91758AA@psi.ch> Hello, Sorry for coming up with this never-ending story. I know that token management is mainly autoconfigured and even the placement of token manager nodes is no longer under user control in all cases. Still I would like to monitor this component to see if we are close to some limit like memory or rpc rate. Especially as we?ll do some major changes to our setup soon. I would like to monitor the performance of our token manager nodes to get warned _before_ we get performance issues. Any advice is welcome. Ideally I would like collect some numbers and pass them on to influxdb or similar. I didn?t find anything in perfmon/zimon that seemed to match. I could imagine that numbers like ?number of active tokens? and ?number of token operations? per manager would be helpful. Or ?# of rpc calls per second?. And maybe ?number of open files?, ?number of token operations?, ?number of tokens? for clients. And maybe some percentage of used token memory ? and cache hit ratio ? This would also help to tune ? like if a client does very many token operations or rpc calls maybe I should increase maxFilesToCache. The above is just to illustrate, as token management is complicated the really valuable metrics may be different. Or am I too anxious and should wait and see instead? cheers, Heiner Heiner Billich -- Paul Scherrer Institut Heiner Billich System Engineer Scientific Computing Science IT / High Performance Computing WHGA/106 Forschungsstrasse 111 5232 Villigen PSI Switzerland Phone +41 56 310 36 02 heiner.billich at psi.ch https://www.psi.ch -------------- next part -------------- An HTML attachment was scrubbed... URL: From TOMP at il.ibm.com Thu Jan 31 15:11:24 2019 From: TOMP at il.ibm.com (Tomer Perry) Date: Thu, 31 Jan 2019 17:11:24 +0200 Subject: [gpfsug-discuss] Token manager - how to monitor performance? In-Reply-To: <02FE0AE6-BDDC-4E10-9C41-E68EB91758AA@psi.ch> References: <02FE0AE6-BDDC-4E10-9C41-E68EB91758AA@psi.ch> Message-ID: Hi, I agree that we should potentially add mode metrics, but for a start, I would look into mmdiag --memory and mmdiag --tokenmgr (the latter show different output on a token server). Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: "Billich Heinrich Rainer (PSI)" To: gpfsug main discussion list Date: 31/01/2019 16:56 Subject: [gpfsug-discuss] Token manager - how to monitor performance? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello, Sorry for coming up with this never-ending story. I know that token management is mainly autoconfigured and even the placement of token manager nodes is no longer under user control in all cases. Still I would like to monitor this component to see if we are close to some limit like memory or rpc rate. Especially as we?ll do some major changes to our setup soon. I would like to monitor the performance of our token manager nodes to get warned _before_ we get performance issues. Any advice is welcome. Ideally I would like collect some numbers and pass them on to influxdb or similar. I didn?t find anything in perfmon/zimon that seemed to match. I could imagine that numbers like ?number of active tokens? and ?number of token operations? per manager would be helpful. Or ?# of rpc calls per second?. And maybe ?number of open files?, ?number of token operations?, ?number of tokens? for clients. And maybe some percentage of used token memory ? and cache hit ratio ? This would also help to tune ? like if a client does very many token operations or rpc calls maybe I should increase maxFilesToCache. The above is just to illustrate, as token management is complicated the really valuable metrics may be different. Or am I too anxious and should wait and see instead? cheers, Heiner Heiner Billich -- Paul Scherrer Institut Heiner Billich System Engineer Scientific Computing Science IT / High Performance Computing WHGA/106 Forschungsstrasse 111 5232 Villigen PSI Switzerland Phone +41 56 310 36 02 heiner.billich at psi.ch https://www.psi.ch _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=mLPyKeOa1gNDrORvEXBgMw&m=J5n3Wsk1f6CsyL867jkmS3P2BYZDfkPS6GB9dShnYcI&s=YFTWUM3MQu8C1MitRnyPnYQ_wMtjj3Uwmif6gJUoLgc&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Wed Jan 30 21:15:48 2019 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 30 Jan 2019 21:15:48 +0000 Subject: [gpfsug-discuss] =?utf-8?q?Node_=E2=80=98crash_and_restart?= =?utf-8?q?=E2=80=99_event_using_GPFS_callback=3F?= In-Reply-To: <9892d4f382cb4fbe80dbde9f87724632@mbxtoa1.winmail.deshaw.com> References: <7F7F8D51-256D-4EA7-B03F-11CF7B752AFA@nuance.com> <9892d4f382cb4fbe80dbde9f87724632@mbxtoa1.winmail.deshaw.com> Message-ID: Hi Bob, We use the nodeLeave callback to detect node expels ? for what you?re wanting to do I wonder if nodeJoin might work?? If a node joins the cluster and then has an uptime of a few minutes you could go looking in /tmp/mmfs. HTH... -- Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Jan 30, 2019, at 3:02 PM, Sanchez, Paul > wrote: There are some cases which I don?t believe can be caught with callbacks (e.g. DMS = Dead Man Switch). But you could possibly use preStartup to check the host uptime to make an assumption if GPFS was restarted long after the host booted. You could also peek in /tmp/mmfs and only report if you find something there. That said, the docs say that preStartup fires after the node joins the cluster. So if that means once the node is ?active? then you might miss out on nodes stuck in ?arbitrating? for a while due to a waiter problem. We run a script with cron which monitors the myriad things which can go wrong and attempt to right those which are safe to fix, and raise alerts appropriately. Something like that, outside the reach of GPFS, is often a good choice if you don?t need to know something the moment it happens. Thx Paul From: gpfsug-discuss-bounces at spectrumscale.org > On Behalf Of Oesterlin, Robert Sent: Wednesday, January 30, 2019 3:52 PM To: gpfsug main discussion list > Subject: [gpfsug-discuss] Node ?crash and restart? event using GPFS callback? Anyone crafted a good way to detect a node ?crash and restart? event using GPFS callbacks? I?m thinking ?preShutdown? but I?m not sure if that?s the best. What I?m really looking for is did the node shutdown (abort) and create a dump in /tmp/mmfs Bob Oesterlin Sr Principal Storage Engineer, Nuance _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cccd012a939124326a53908d686f64117%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636844789557921185&sdata=9bMPd%2F%2B%2Babt6IdeFYcdznPBQwPrMLFsXHTBYISlyYGM%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Jan 31 15:40:50 2019 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 31 Jan 2019 12:40:50 -0300 Subject: [gpfsug-discuss] =?utf-8?q?Node_=E2=80=98crash_and_restart?= =?utf-8?q?=E2=80=99_event_using_GPFS_callback=3F?= In-Reply-To: References: <7F7F8D51-256D-4EA7-B03F-11CF7B752AFA@nuance.com><9892d4f382cb4fbe80dbde9f87724632@mbxtoa1.winmail.deshaw.com> Message-ID: Various "leave" / join events may be interesting ... But you've got to consider that an abrupt failure of several nodes is not necessarily recorded anywhere! For example, because the would be recording devices might all lose power at the same time. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu Jan 31 15:46:38 2019 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 31 Jan 2019 15:46:38 +0000 Subject: [gpfsug-discuss] =?utf-8?q?Node_=E2=80=98crash_and_restart?= =?utf-8?q?=E2=80=99_event_using_GPFS_callback=3F?= In-Reply-To: References: <7F7F8D51-256D-4EA7-B03F-11CF7B752AFA@nuance.com> <9892d4f382cb4fbe80dbde9f87724632@mbxtoa1.winmail.deshaw.com> Message-ID: <572FF01C-A82D-45FD-AB34-A897BFE59325@nuance.com> A better way to detect node expels is to install the expelnode into /var/mmfs/etc/ (sample in /usr/lpp/mmfs/samples/expelnode.sample) - put this on your manager nodes. It runs on every expel and you can customize it pretty easily. We generate a Slack message to a specific channel: GPFS Node Expel nrg1 APP [1:56 AM] nrg1-gpfs01 Expelling node gnj-r05r05u30, other node cnt-r04r08u40 Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of "Buterbaugh, Kevin L" Reply-To: gpfsug main discussion list Date: Thursday, January 31, 2019 at 9:19 AM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Node ?crash and restart? event using GPFS callback? Hi Bob, We use the nodeLeave callback to detect node expels ? for what you?re wanting to do I wonder if nodeJoin might work?? If a node joins the cluster and then has an uptime of a few minutes you could go looking in /tmp/mmfs. HTH... -- Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Jan 30, 2019, at 3:02 PM, Sanchez, Paul > wrote: There are some cases which I don?t believe can be caught with callbacks (e.g. DMS = Dead Man Switch). But you could possibly use preStartup to check the host uptime to make an assumption if GPFS was restarted long after the host booted. You could also peek in /tmp/mmfs and only report if you find something there. That said, the docs say that preStartup fires after the node joins the cluster. So if that means once the node is ?active? then you might miss out on nodes stuck in ?arbitrating? for a while due to a waiter problem. We run a script with cron which monitors the myriad things which can go wrong and attempt to right those which are safe to fix, and raise alerts appropriately. Something like that, outside the reach of GPFS, is often a good choice if you don?t need to know something the moment it happens. Thx Paul From: gpfsug-discuss-bounces at spectrumscale.org > On Behalf Of Oesterlin, Robert Sent: Wednesday, January 30, 2019 3:52 PM To: gpfsug main discussion list > Subject: [gpfsug-discuss] Node ?crash and restart? event using GPFS callback? Anyone crafted a good way to detect a node ?crash and restart? event using GPFS callbacks? I?m thinking ?preShutdown? but I?m not sure if that?s the best. What I?m really looking for is did the node shutdown (abort) and create a dump in /tmp/mmfs Bob Oesterlin Sr Principal Storage Engineer, Nuance _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cccd012a939124326a53908d686f64117%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636844789557921185&sdata=9bMPd%2F%2B%2Babt6IdeFYcdznPBQwPrMLFsXHTBYISlyYGM%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From chair at spectrumscale.org Thu Jan 31 20:44:25 2019 From: chair at spectrumscale.org (Simon Thompson (Spectrum Scale User Group Chair)) Date: Thu, 31 Jan 2019 20:44:25 +0000 Subject: [gpfsug-discuss] Call for input & save the date Message-ID: <213C4D17-C0D2-4883-834F-7E2E00B4EE3F@spectrumscale.org> Hi All, We?ve just published the main dates for 2019 Spectrum Scale meetings on the user group website at: https://www.spectrumscaleug.org/ Please take a look over the list of events and pencil them in your diary! (some of those later in the year are tentative and there are a couple more that might get added in some other territories). Myself, Kristy, Bob, Chris and Ulf are currently having some discussion on the topics we?d like to have covered in the various user group meetings. If you have any specific topics you?d like to hear about, then please let me know in the next few days? we can?t promise we can get a speaker, but if you don?t let us know we can?t try! As usual, we?ll be looking for user speakers for all of our events. The user group events only work well if we have people talking about their uses of Spectrum Scale, so please think about offering a talk and let us know! Thanks Simon UK Group Chair -------------- next part -------------- An HTML attachment was scrubbed... URL: From andreas.mattsson at maxiv.lu.se Fri Jan 4 09:09:03 2019 From: andreas.mattsson at maxiv.lu.se (Andreas Mattsson) Date: Fri, 4 Jan 2019 09:09:03 +0000 Subject: [gpfsug-discuss] Filesystem access issues via CES NFS In-Reply-To: References: <717f49aade0b439eb1b99fc620a21cac@maxiv.lu.se> <9456645b0a1f4b488b13874ea672b9b8@maxiv.lu.se>, Message-ID: Just reporting back that the issue we had seems to have been solved. In our case it was fixed by applying hotfix-packages from IBM. Did this in December and I can no longer trigger the issue. Hopefully, it'll stay fixed when we get full production load on the system again now in January. Also, as far as I can see, it looks like Scale 5.0.2.2 includes these packages already. Regards, Andreas mattsson ____________________________________________ [X] Andreas Mattsson Systems Engineer MAX IV Laboratory Lund University P.O. Box 118, SE-221 00 Lund, Sweden Visiting address: Fotongatan 2, 224 84 Lund Mobile: +46 706 64 95 44 andreas.mattsson at maxiv.lu.se www.maxiv.se ________________________________ Fr?n: gpfsug-discuss-bounces at spectrumscale.org f?r Ulrich Sibiller Skickat: den 13 december 2018 14:52:42 Till: gpfsug-discuss at spectrumscale.org ?mne: Re: [gpfsug-discuss] Filesystem access issues via CES NFS On 23.11.2018 14:41, Andreas Mattsson wrote: > Yes, this is repeating. > > We?ve ascertained that it has nothing to do at all with file operations on the GPFS side. > > Randomly throughout the filesystem mounted via NFS, ls or file access will give > > ? > > > ls: reading directory /gpfs/filessystem/test/testdir: Invalid argument > > ? > > Trying again later might work on that folder, but might fail somewhere else. > > We have tried exporting the same filesystem via a standard kernel NFS instead of the CES > Ganesha-NFS, and then the problem doesn?t exist. > > So it is definitely related to the Ganesha NFS server, or its interaction with the file system. > > Will see if I can get a tcpdump of the issue. We see this, too. We cannot trigger it. Fortunately I have managed to capture some logs with debugging enabled. I have now dug into the ganesha 2.5.3 code and I think the netgroup caching is the culprit. Here some FULL_DEBUG output: 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :Check for address 1.2.3.4 for export id 1 path /gpfsexport 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] client_match :EXPORT :M_DBG :Match V4: 0xcf7fe0 NETGROUP_CLIENT: netgroup1 (options=421021e2root_squash , RWrw, 3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] client_match :EXPORT :M_DBG :Match V4: 0xcfe320 NETGROUP_CLIENT: netgroup2 (options=421021e2root_squash , RWrw, 3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] client_match :EXPORT :M_DBG :Match V4: 0xcfe380 NETGROUP_CLIENT: netgroup3 (options=421021e2root_squash , RWrw, 3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :EXPORT (options=03303002 , , , , , -- Deleg, , ) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :EXPORT_DEFAULTS (options=42102002root_squash , ----, 3--, ---, TCP, ----, Manage_Gids , , anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :default options (options=03303002root_squash , ----, 34-, UDP, TCP, ----, No Manage_Gids, -- Deleg, anon_uid= -2, anon_gid= -2, none, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :Final options (options=42102002root_squash , ----, 3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_rpc_execute :DISP :INFO :DISP: INFO: Client ::ffff:1.2.3.4 is not allowed to access Export_Id 1 /gpfsexport, vers=3, proc=18 The client "client1" is definitely a member of the "netgroup1". But the NETGROUP_CLIENT lookups for "netgroup2" and "netgroup3" can only happen if the netgroup caching code reports that "client1" is NOT a member of "netgroup1". I have also opened a support case at IBM for this. @Malahal: Looks like you have written the netgroup caching code, feel free to ask for further details if required. Kind regards, Ulrich Sibiller -- Dipl.-Inf. Ulrich Sibiller science + computing ag System Administration Hagellocher Weg 73 72070 Tuebingen, Germany https://atos.net/de/deutschland/sc -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From roblogie at au1.ibm.com Tue Jan 8 21:49:51 2019 From: roblogie at au1.ibm.com (Rob Logie) Date: Tue, 8 Jan 2019 21:49:51 +0000 Subject: [gpfsug-discuss] User Login Active Directory authentication on CES nodes with SMB protocol Message-ID: Hi All Is there a way to enable User Login Active Directory authentication on CES nodes with SMB protocol that are joined to an AD domain. ? The AD authentication is working for access to the SMB shares, but not for user login authentication on the CES nodes. Thanks ! Regards, Rob Logie IT Specialist -------------- next part -------------- An HTML attachment was scrubbed... URL: From lgayne at us.ibm.com Tue Jan 8 21:53:51 2019 From: lgayne at us.ibm.com (Lyle Gayne) Date: Tue, 8 Jan 2019 16:53:51 -0500 Subject: [gpfsug-discuss] User Login Active Directory authentication on CES nodes with SMB protocol In-Reply-To: References: Message-ID: Adding Ingo Meents for response From: "Rob Logie" To: gpfsug-discuss at spectrumscale.org Date: 01/08/2019 04:50 PM Subject: [gpfsug-discuss] User Login Active Directory authentication on CES nodes with SMB protocol Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All Is there a way to enable User Login Active Directory authentication on CES nodes with SMB protocol that are joined to an AD domain. ? The AD authentication is working for access to the SMB shares, but not for user login authentication on the CES nodes. Thanks ! Regards, Rob Logie IT Specialist _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From arc at b4restore.com Wed Jan 9 10:25:13 2019 From: arc at b4restore.com (Andi Rhod Christiansen) Date: Wed, 9 Jan 2019 10:25:13 +0000 Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation. Message-ID: Hi, I seem to be unable to find any information on separating protocol services on specific CES nodes within a cluster. Does anyone know if it is possible to take, lets say 4 of the ces nodes within a cluster and dividing them into two and have two of the running SMB and the other two running OBJ instead of having them all run both services? If it is possible it would be great to hear pros and cons about doing this ? Thanks in advance! Venlig hilsen / Best Regards Andi Christiansen IT Solution Specialist -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Wed Jan 9 11:16:49 2019 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Wed, 9 Jan 2019 11:16:49 +0000 Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation. In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Jan 9 12:19:30 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Wed, 9 Jan 2019 12:19:30 +0000 Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation. Message-ID: <5ABB423F-71AF-4469-9FDA-589EA8872B86@bham.ac.uk> You have to run all services on all nodes ( ? ) actually its technically possible to remove the packages once protocols is running on the node, but next time you reboot the node, it will get marked unhealthy and you spend an hour working out why? But what we do to split load is have different IPs assigned to different CES groups and then assign the SMB nodes to the SMB group IPs etc ? Technically a user could still connect to the NFS (in our case) IPs with SMB protocol, but there?s not a lot we can do about that ? though our upstream firewall drops said traffic. Simon From: on behalf of "arc at b4restore.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Wednesday, 9 January 2019 at 10:31 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation. Hi, I seem to be unable to find any information on separating protocol services on specific CES nodes within a cluster. Does anyone know if it is possible to take, lets say 4 of the ces nodes within a cluster and dividing them into two and have two of the running SMB and the other two running OBJ instead of having them all run both services? If it is possible it would be great to hear pros and cons about doing this ? Thanks in advance! Venlig hilsen / Best Regards Andi Christiansen IT Solution Specialist -------------- next part -------------- An HTML attachment was scrubbed... URL: From arc at b4restore.com Wed Jan 9 13:23:17 2019 From: arc at b4restore.com (Andi Rhod Christiansen) Date: Wed, 9 Jan 2019 13:23:17 +0000 Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation. In-Reply-To: References: Message-ID: <1886db2cdf074bf0aaa151c395d300d5@B4RWEX01.internal.b4restore.com> Hi Andrew, Where can I request such a feature? ? Venlig hilsen / Best Regards Andi Rhod Christiansen Fra: gpfsug-discuss-bounces at spectrumscale.org P? vegne af Andrew Beattie Sendt: 9. januar 2019 12:17 Til: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Emne: Re: [gpfsug-discuss] Spectrum Scale protocol node service separation. Andi, All the CES nodes in the same cluster will share the same protocol exports if you want to separate them you need to create remote mount clusters and export the additional protocols via the remote mount it would actually be a useful RFE to have the ablity to create CES groups attached to the base cluster and by group create exports of different protocols, but its not available today. Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: Andi Rhod Christiansen > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation. Date: Wed, Jan 9, 2019 8:31 PM Hi, I seem to be unable to find any information on separating protocol services on specific CES nodes within a cluster. Does anyone know if it is possible to take, lets say 4 of the ces nodes within a cluster and dividing them into two and have two of the running SMB and the other two running OBJ instead of having them all run both services? If it is possible it would be great to hear pros and cons about doing this ? Thanks in advance! Venlig hilsen / Best Regards Andi Christiansen IT Solution Specialist _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From arc at b4restore.com Wed Jan 9 13:24:30 2019 From: arc at b4restore.com (Andi Rhod Christiansen) Date: Wed, 9 Jan 2019 13:24:30 +0000 Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation. In-Reply-To: <5ABB423F-71AF-4469-9FDA-589EA8872B86@bham.ac.uk> References: <5ABB423F-71AF-4469-9FDA-589EA8872B86@bham.ac.uk> Message-ID: Hi Simon, It was actually also the only solution I found if I want to keep them within the same cluster ? Thanks for the reply, I will see what we figure out ! Venlig hilsen / Best Regards Andi Rhod Christiansen Fra: gpfsug-discuss-bounces at spectrumscale.org P? vegne af Simon Thompson Sendt: 9. januar 2019 13:20 Til: gpfsug main discussion list Emne: Re: [gpfsug-discuss] Spectrum Scale protocol node service separation. You have to run all services on all nodes ( ? ) actually its technically possible to remove the packages once protocols is running on the node, but next time you reboot the node, it will get marked unhealthy and you spend an hour working out why? But what we do to split load is have different IPs assigned to different CES groups and then assign the SMB nodes to the SMB group IPs etc ? Technically a user could still connect to the NFS (in our case) IPs with SMB protocol, but there?s not a lot we can do about that ? though our upstream firewall drops said traffic. Simon From: > on behalf of "arc at b4restore.com" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Wednesday, 9 January 2019 at 10:31 To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation. Hi, I seem to be unable to find any information on separating protocol services on specific CES nodes within a cluster. Does anyone know if it is possible to take, lets say 4 of the ces nodes within a cluster and dividing them into two and have two of the running SMB and the other two running OBJ instead of having them all run both services? If it is possible it would be great to hear pros and cons about doing this ? Thanks in advance! Venlig hilsen / Best Regards Andi Christiansen IT Solution Specialist -------------- next part -------------- An HTML attachment was scrubbed... URL: From Paul.Sanchez at deshaw.com Wed Jan 9 14:05:48 2019 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Wed, 9 Jan 2019 14:05:48 +0000 Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation. In-Reply-To: References: Message-ID: <53ec54bb621242109a789e51d61b1377@mbxtoa1.winmail.deshaw.com> The docs say: ?CES supports the following export protocols: NFS, SMB, object, and iSCSI (block). Each protocol can be enabled or disabled in the cluster. If a protocol is enabled in the CES cluster, all CES nodes serve that protocol.? Which would seem to indicate that the answer is ?no?. This kind of thing is another good reason to license Scale by storage capacity rather than by sockets (PVU). This approach was already a good idea due to the flexibility it allows to scale manager, quorum, and NSD server nodes for performance and high-availability without affecting your software licensing costs. This can result in better design and the flexibility to more quickly respond to new problems by adding server nodes. So assuming you?re not on the old PVU licensing model, it is trivial to deploy as many gateway nodes as needed to separate these into distinct remote clusters. You can create an object gateway cluster, and a CES gateway cluster each which only mounts and exports what is necessary. You can even virtualize these servers and host them on the same hardware, if you?re into that. -Paul From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Andi Rhod Christiansen Sent: Wednesday, January 9, 2019 5:25 AM To: gpfsug main discussion list Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation. Hi, I seem to be unable to find any information on separating protocol services on specific CES nodes within a cluster. Does anyone know if it is possible to take, lets say 4 of the ces nodes within a cluster and dividing them into two and have two of the running SMB and the other two running OBJ instead of having them all run both services? If it is possible it would be great to hear pros and cons about doing this ? Thanks in advance! Venlig hilsen / Best Regards Andi Christiansen IT Solution Specialist -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Jan 9 16:35:37 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Wed, 9 Jan 2019 16:35:37 +0000 Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation. In-Reply-To: <53ec54bb621242109a789e51d61b1377@mbxtoa1.winmail.deshaw.com> References: , <53ec54bb621242109a789e51d61b1377@mbxtoa1.winmail.deshaw.com> Message-ID: I think only recently was remote cluster support added (though we have been doing it since CES was released). I agree that capacity licenses have freed us to implement a better solution.. no longer do we run quorum/token managers on nsd nodes to reduce socket costs. I believe socket based licenses are also about to or already no longer available for new customers (existing customers can continue to buy). Carl can probably comment on this? Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Paul.Sanchez at deshaw.com [Paul.Sanchez at deshaw.com] Sent: 09 January 2019 14:05 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Spectrum Scale protocol node service separation. The docs say: ?CES supports the following export protocols: NFS, SMB, object, and iSCSI (block). Each protocol can be enabled or disabled in the cluster. If a protocol is enabled in the CES cluster, all CES nodes serve that protocol.? Which would seem to indicate that the answer is ?no?. This kind of thing is another good reason to license Scale by storage capacity rather than by sockets (PVU). This approach was already a good idea due to the flexibility it allows to scale manager, quorum, and NSD server nodes for performance and high-availability without affecting your software licensing costs. This can result in better design and the flexibility to more quickly respond to new problems by adding server nodes. So assuming you?re not on the old PVU licensing model, it is trivial to deploy as many gateway nodes as needed to separate these into distinct remote clusters. You can create an object gateway cluster, and a CES gateway cluster each which only mounts and exports what is necessary. You can even virtualize these servers and host them on the same hardware, if you?re into that. -Paul From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Andi Rhod Christiansen Sent: Wednesday, January 9, 2019 5:25 AM To: gpfsug main discussion list Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation. Hi, I seem to be unable to find any information on separating protocol services on specific CES nodes within a cluster. Does anyone know if it is possible to take, lets say 4 of the ces nodes within a cluster and dividing them into two and have two of the running SMB and the other two running OBJ instead of having them all run both services? If it is possible it would be great to hear pros and cons about doing this ? Thanks in advance! Venlig hilsen / Best Regards Andi Christiansen IT Solution Specialist From aspalazz at us.ibm.com Wed Jan 9 17:21:03 2019 From: aspalazz at us.ibm.com (Aaron S Palazzolo) Date: Wed, 9 Jan 2019 17:21:03 +0000 Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation In-Reply-To: Message-ID: An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Jan 9 18:04:47 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Wed, 9 Jan 2019 18:04:47 +0000 Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation In-Reply-To: References: , Message-ID: Can you use node affinity within CES groups? For example I have some shiny new servers I want to normally use. If I plan maintenance, I move the IP to another shiny box. But I also have some old off support legacy hardware that I'm happy to use in a DR situation (e.g. they are in another site). So I want a group for my SMB boxes and NFS boxes, but have affinity normally, and then have old hardware in case of failure. Whilst we're on protocols, are there any restrictions on using mixed architectures? I don't recall seeing this but... E.g. my new shiny boxes are ppc64le systems and my old legacy nodes are x86. It's all ctdb locking right .. (ok maybe mixing be and le hosts would be bad) (Sure I'll take a performance hit when I fail to the old nodes, but that is better than no service). Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of aspalazz at us.ibm.com [aspalazz at us.ibm.com] Sent: 09 January 2019 17:21 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Spectrum Scale protocol node service separation Hey guys - I wanted to reply from the Scale development side..... First off, consider CES as a stack and the implications of such: - all protocols are installed on all nodes - if a specific protocol is enabled (SMB, NFS, OBJ, Block), it's enabled for all protocol nodes - if a specific protocol is started (SMB, NFS, OBJ, Block), it's started on all nodes by default, unless manually specified. As was indicated in the e-mail chain, you don't want to be removing rpms to create a subset of nodes serving various protocols as this will cause overall issues. You also don't want to manually be disabling protocols on some nodes/not others in order to achieve nodes that are 'only serving' SMB, for instance. Doing this manual stopping/starting of protocols isn't something that will adhere to failover. =============================================================== A few possible solutions if you want to segregate protocols to specific nodes are: =============================================================== 1) CES-Groups in combination with specific IPs / DNS hostnames that correspond to each protocol. - As mentioned, this can still be bypassed if someone attempts a mount using an IP/DNS name not set for their protocol. However, you could probably prevent some of this with an external firewall rule. - Using CES-Groups confines the IPs/DNS hostnames to very specific nodes 2) Firewall rules - This is best if done external to the cluster, and at a level that can restrict specific protocol traffic to specific IPs/hostnames - combine this with #1 for the best results. - Although it may work, try to stay away from crazy firewall rules on each protocol node itself as this can get confusing very quickly. It's easier if you can set this up external to the nodes. 3) Similar to above but using Node Affinity CES-IP policy - but no CES groups. - Upside is node-affinity will attempt to keep your CES-IPs associated with specific nodes. So if you restrict specific protocol traffic to specific IPs, then they'll stay on nodes you designate - Watch out for failovers. In error cases (or upgrades) where an IP needs to move to another node, it obviously can't remain on the node that's having issues. This means you may have protocol trafffic crossover when this occurs. 4) A separate remote cluster for each CES protocol - In this example, you could make fairly small remote clusters (although we recommend 2->3nodes at least for failover purposes). The local cluster would provide the storage. The remote clusters would mount it. One remote cluster could have only SMB enabled. Another remote cluster could have only OBJ enabled. etc... ------ I hope this helps a bit.... Regards, Aaron Palazzolo IBM Spectrum Scale Deployment, Infrastructure, Virtualization 9042 S Rita Road, Tucson AZ 85744 Phone: 520-799-5161, T/L: 321-5161 E-mail: aspalazz at us.ibm.com ----- Original message ----- From: gpfsug-discuss-request at spectrumscale.org Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Cc: Subject: gpfsug-discuss Digest, Vol 84, Issue 4 Date: Wed, Jan 9, 2019 7:13 AM Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: Spectrum Scale protocol node service separation. (Andi Rhod Christiansen) 2. Re: Spectrum Scale protocol node service separation. (Sanchez, Paul) ---------------------------------------------------------------------- Message: 1 Date: Wed, 9 Jan 2019 13:24:30 +0000 From: Andi Rhod Christiansen To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Spectrum Scale protocol node service separation. Message-ID: Content-Type: text/plain; charset="utf-8" Hi Simon, It was actually also the only solution I found if I want to keep them within the same cluster ? Thanks for the reply, I will see what we figure out ! Venlig hilsen / Best Regards Andi Rhod Christiansen Fra: gpfsug-discuss-bounces at spectrumscale.org P? vegne af Simon Thompson Sendt: 9. januar 2019 13:20 Til: gpfsug main discussion list Emne: Re: [gpfsug-discuss] Spectrum Scale protocol node service separation. You have to run all services on all nodes ( ? ) actually its technically possible to remove the packages once protocols is running on the node, but next time you reboot the node, it will get marked unhealthy and you spend an hour working out why? But what we do to split load is have different IPs assigned to different CES groups and then assign the SMB nodes to the SMB group IPs etc ? Technically a user could still connect to the NFS (in our case) IPs with SMB protocol, but there?s not a lot we can do about that ? though our upstream firewall drops said traffic. Simon From: > on behalf of "arc at b4restore.com" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Wednesday, 9 January 2019 at 10:31 To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation. Hi, I seem to be unable to find any information on separating protocol services on specific CES nodes within a cluster. Does anyone know if it is possible to take, lets say 4 of the ces nodes within a cluster and dividing them into two and have two of the running SMB and the other two running OBJ instead of having them all run both services? If it is possible it would be great to hear pros and cons about doing this ? Thanks in advance! Venlig hilsen / Best Regards Andi Christiansen IT Solution Specialist -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Wed, 9 Jan 2019 14:05:48 +0000 From: "Sanchez, Paul" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Spectrum Scale protocol node service separation. Message-ID: <53ec54bb621242109a789e51d61b1377 at mbxtoa1.winmail.deshaw.com> Content-Type: text/plain; charset="utf-8" The docs say: ?CES supports the following export protocols: NFS, SMB, object, and iSCSI (block). Each protocol can be enabled or disabled in the cluster. If a protocol is enabled in the CES cluster, all CES nodes serve that protocol.? Which would seem to indicate that the answer is ?no?. This kind of thing is another good reason to license Scale by storage capacity rather than by sockets (PVU). This approach was already a good idea due to the flexibility it allows to scale manager, quorum, and NSD server nodes for performance and high-availability without affecting your software licensing costs. This can result in better design and the flexibility to more quickly respond to new problems by adding server nodes. So assuming you?re not on the old PVU licensing model, it is trivial to deploy as many gateway nodes as needed to separate these into distinct remote clusters. You can create an object gateway cluster, and a CES gateway cluster each which only mounts and exports what is necessary. You can even virtualize these servers and host them on the same hardware, if you?re into that. -Paul From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Andi Rhod Christiansen Sent: Wednesday, January 9, 2019 5:25 AM To: gpfsug main discussion list Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation. Hi, I seem to be unable to find any information on separating protocol services on specific CES nodes within a cluster. Does anyone know if it is possible to take, lets say 4 of the ces nodes within a cluster and dividing them into two and have two of the running SMB and the other two running OBJ instead of having them all run both services? If it is possible it would be great to hear pros and cons about doing this ? Thanks in advance! Venlig hilsen / Best Regards Andi Christiansen IT Solution Specialist -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 84, Issue 4 ********************************************* From christof.schmitt at us.ibm.com Wed Jan 9 18:10:13 2019 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Wed, 9 Jan 2019 18:10:13 +0000 Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation In-Reply-To: References: , , Message-ID: An HTML attachment was scrubbed... URL: From christof.schmitt at us.ibm.com Wed Jan 9 19:03:25 2019 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Wed, 9 Jan 2019 19:03:25 +0000 Subject: [gpfsug-discuss] User Login Active Directory authentication on CES nodes with SMB protocol In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.1__=8FBB09EFDFEBBB408f9e8a93df938690918c8FB at .gif Type: image/gif Size: 105 bytes Desc: not available URL: From carlz at us.ibm.com Wed Jan 9 19:19:20 2019 From: carlz at us.ibm.com (Carl Zetie) Date: Wed, 9 Jan 2019 19:19:20 +0000 Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation Message-ID: ST>I believe socket based licenses are also about to or already no longer available ST>for new customers (existing customers can continue to buy). ST>Carl can probably comment on this? That is correct. Friday Jan 11 is the last chance for *new* customers to buy Standard Edition sockets. And as Simon says, those of you who are currently Sockets customers can remain on Sockets, buying additional licenses and renewing existing licenses. (IBM Legal requires me to add, any statement about the future is an intention, not a commitment -- but, as I've said before, as long as it's my decision to make, my intent is to keep Sockets as long as existing customers want them). And yes, one of the reasons I wanted to get away from Socket pricing is the kind of scenarios some of you brought up. Implementing the best deployment topology for your needs shouldn't be a licensing transaction. (Don't even get me started on client licenses). regards, Carl Zetie Program Director Offering Management for Spectrum Scale, IBM ---- (540) 882 9353 ][ Research Triangle Park carlz at us.ibm.com From cblack at nygenome.org Wed Jan 9 19:11:40 2019 From: cblack at nygenome.org (Christopher Black) Date: Wed, 9 Jan 2019 19:11:40 +0000 Subject: [gpfsug-discuss] User Login Active Directory authentication on CES nodes with SMB protocol In-Reply-To: References: Message-ID: <7399F5C1-A23F-4852-B912-0965E111D191@nygenome.org> We use realmd and some automation for sssd configs to get linux hosts to have local login and ssh tied to AD accounts, however we do not apply these configs on our protocol nodes. From: on behalf of Christof Schmitt Reply-To: gpfsug main discussion list Date: Wednesday, January 9, 2019 at 2:03 PM To: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" , Ingo Meents Subject: Re: [gpfsug-discuss] User Login Active Directory authentication on CES nodes with SMB protocol There is the PAM module that would forward authentication requests to winbindd: /usr/lpp/mmfs/lib64/security/pam_gpfs-winbind.so In theory that can be added to the PAM configuration in /etc/pam.d/. On the other hand, we have never tested this nor claimed support, so there might be reasons why this won't work. Other customers have configured sssd manually in addition to the Scale authentication to allow user logon and authentication for sudo. If the request here is to configure AD authentication through mmuserauth and that should also provide user logon, that should probably be treated as a feature request through RFE. Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) ----- Original message ----- From: "Lyle Gayne" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Ingo Meents Subject: Re: [gpfsug-discuss] User Login Active Directory authentication on CES nodes with SMB protocol Date: Tue, Jan 8, 2019 2:54 PM Adding Ingo Meents for response [Inactive hide details for "Rob Logie" ---01/08/2019 04:50:22 PM---Hi All Is there a way to enable User Login Active Directory a]"Rob Logie" ---01/08/2019 04:50:22 PM---Hi All Is there a way to enable User Login Active Directory authentication on CES From: "Rob Logie" To: gpfsug-discuss at spectrumscale.org Date: 01/08/2019 04:50 PM Subject: [gpfsug-discuss] User Login Active Directory authentication on CES nodes with SMB protocol Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All Is there a way to enable User Login Active Directory authentication on CES nodes with SMB protocol that are joined to an AD domain. ? The AD authentication is working for access to the SMB shares, but not for user login authentication on the CES nodes. Thanks ! Regards, Rob Logie IT Specialist _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 106 bytes Desc: image001.gif URL: From Kevin.Buterbaugh at Vanderbilt.Edu Tue Jan 8 22:12:22 2019 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Tue, 8 Jan 2019 22:12:22 +0000 Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? Message-ID: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> Hi All, Happy New Year to all! Personally, I?ll gladly and gratefully settle for 2019 not being a dumpster fire like 2018 was (those who attended my talk at the user group meeting at SC18 know what I?m referring to), but I certainly wish all of you the best! Is there a way to get a list of the filesets in a filesystem without running mmlsfileset? I was kind of expecting to find them in one of the config files somewhere under /var/mmfs but haven?t found them yet in the searching I?ve done. The reason I?m asking is that we have a Python script that users can run that needs to get a list of all the filesets in a filesystem. There are obviously multiple issues with that, so the workaround we?re using for now is to have a cron job which runs mmlsfileset once a day and dumps it out to a text file, which the script then reads. That?s sub-optimal for any day on which a fileset gets created or deleted, so I?m looking for a better way ? one which doesn?t require root privileges and preferably doesn?t involve running a GPFS command at all. Thanks in advance. Kevin P.S. I am still working on metadata and iSCSI testing and will report back on that when complete. P.P.S. We ended up adding our new NSDs comprised of (not really) 12 TB disks to the capacity pool and things are working fine. ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From skylar2 at uw.edu Wed Jan 9 21:37:04 2019 From: skylar2 at uw.edu (Skylar Thompson) Date: Wed, 9 Jan 2019 21:37:04 +0000 Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? In-Reply-To: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> References: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> Message-ID: <20190109213704.bbbqbuqzkrotcjpu@utumno.gs.washington.edu> I suppose you could run the underlying tslsfileset, though that's probably not the answer you're looking for. Out of curiousity, what are you hoping to gain by not running mmlsfileset? Is the problem scaling due to the number of filesets that you have defined? On Tue, Jan 08, 2019 at 10:12:22PM +0000, Buterbaugh, Kevin L wrote: > Hi All, > > Happy New Year to all! Personally, I???ll gladly and gratefully settle for 2019 not being a dumpster fire like 2018 was (those who attended my talk at the user group meeting at SC18 know what I???m referring to), but I certainly wish all of you the best! > > Is there a way to get a list of the filesets in a filesystem without running mmlsfileset? I was kind of expecting to find them in one of the config files somewhere under /var/mmfs but haven???t found them yet in the searching I???ve done. > > The reason I???m asking is that we have a Python script that users can run that needs to get a list of all the filesets in a filesystem. There are obviously multiple issues with that, so the workaround we???re using for now is to have a cron job which runs mmlsfileset once a day and dumps it out to a text file, which the script then reads. That???s sub-optimal for any day on which a fileset gets created or deleted, so I???m looking for a better way ??? one which doesn???t require root privileges and preferably doesn???t involve running a GPFS command at all. > > Thanks in advance. > > Kevin > > P.S. I am still working on metadata and iSCSI testing and will report back on that when complete. > P.P.S. We ended up adding our new NSDs comprised of (not really) 12 TB disks to the capacity pool and things are working fine. > > ??? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From S.J.Thompson at bham.ac.uk Wed Jan 9 22:42:01 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Wed, 9 Jan 2019 22:42:01 +0000 Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? In-Reply-To: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> References: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> Message-ID: Hi Kevin, Have you looked at the rest API? https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/com.ibm.spectrum.scale.v5r02.doc/bl1adm_listofapicommands.htm I don't know how much access control there is available in the API so not sure if you could lock some sort of service user down to just the get filesets command? Simon _______________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Buterbaugh, Kevin L [Kevin.Buterbaugh at Vanderbilt.Edu] Sent: 08 January 2019 22:12 To: gpfsug main discussion list Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? Hi All, Happy New Year to all! Personally, I?ll gladly and gratefully settle for 2019 not being a dumpster fire like 2018 was (those who attended my talk at the user group meeting at SC18 know what I?m referring to), but I certainly wish all of you the best! Is there a way to get a list of the filesets in a filesystem without running mmlsfileset? I was kind of expecting to find them in one of the config files somewhere under /var/mmfs but haven?t found them yet in the searching I?ve done. The reason I?m asking is that we have a Python script that users can run that needs to get a list of all the filesets in a filesystem. There are obviously multiple issues with that, so the workaround we?re using for now is to have a cron job which runs mmlsfileset once a day and dumps it out to a text file, which the script then reads. That?s sub-optimal for any day on which a fileset gets created or deleted, so I?m looking for a better way ? one which doesn?t require root privileges and preferably doesn?t involve running a GPFS command at all. Thanks in advance. Kevin P.S. I am still working on metadata and iSCSI testing and will report back on that when complete. P.P.S. We ended up adding our new NSDs comprised of (not really) 12 TB disks to the capacity pool and things are working fine. ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 From Paul.Sanchez at deshaw.com Wed Jan 9 23:03:08 2019 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Wed, 9 Jan 2019 23:03:08 +0000 Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? In-Reply-To: <20190109213704.bbbqbuqzkrotcjpu@utumno.gs.washington.edu> References: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> <20190109213704.bbbqbuqzkrotcjpu@utumno.gs.washington.edu> Message-ID: <3d408800d50648dfae25c3c95c1f04c1@mbxtoa1.winmail.deshaw.com> You could also wrap whatever provisioning script you're using (the thing that runs mmcrfileset), which must already be running as root, so that it also updates the cached text file afterward. -Paul -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Skylar Thompson Sent: Wednesday, January 9, 2019 4:37 PM To: Kevin.Buterbaugh at Vanderbilt.Edu Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? I suppose you could run the underlying tslsfileset, though that's probably not the answer you're looking for. Out of curiousity, what are you hoping to gain by not running mmlsfileset? Is the problem scaling due to the number of filesets that you have defined? On Tue, Jan 08, 2019 at 10:12:22PM +0000, Buterbaugh, Kevin L wrote: > Hi All, > > Happy New Year to all! Personally, I???ll gladly and gratefully settle for 2019 not being a dumpster fire like 2018 was (those who attended my talk at the user group meeting at SC18 know what I???m referring to), but I certainly wish all of you the best! > > Is there a way to get a list of the filesets in a filesystem without running mmlsfileset? I was kind of expecting to find them in one of the config files somewhere under /var/mmfs but haven???t found them yet in the searching I???ve done. > > The reason I???m asking is that we have a Python script that users can run that needs to get a list of all the filesets in a filesystem. There are obviously multiple issues with that, so the workaround we???re using for now is to have a cron job which runs mmlsfileset once a day and dumps it out to a text file, which the script then reads. That???s sub-optimal for any day on which a fileset gets created or deleted, so I???m looking for a better way ??? one which doesn???t require root privileges and preferably doesn???t involve running a GPFS command at all. > > Thanks in advance. > > Kevin > > P.S. I am still working on metadata and iSCSI testing and will report back on that when complete. > P.P.S. We ended up adding our new NSDs comprised of (not really) 12 TB disks to the capacity pool and things are working fine. > > ??? > Kevin Buterbaugh - Senior System Administrator Vanderbilt University - > Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu > - (615)875-9633 > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Kevin.Buterbaugh at Vanderbilt.Edu Wed Jan 9 23:07:00 2019 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 9 Jan 2019 23:07:00 +0000 Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? In-Reply-To: References: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> Message-ID: Hi All, Let me answer Skylar?s questions in another e-mail, which may also tell whether the rest API is a possibility or not. The Python script in question is to display quota information for a user. The mmlsquota command has a couple of issues: 1) its output is confusing to some of our users, 2) more significantly, it displays a ton of information that doesn?t apply to the user running it. For example, it will display all the filesets in a filesystem whether or not the user has access to them. So the Python script figures out what group(s) the user is a member of and only displays information pertinent to them (i.e. the group of the fileset junction path is a group this user is a member of) ? and in a simplified (and potentially colorized) output format. And typing that preceding paragraph caused the lightbulb to go off ? I know the answer to my own question ? have the script run mmlsquota and get the full list of filesets from that, then parse that to determine which ones I actually need to display quota information for. Thanks! Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Jan 9, 2019, at 4:42 PM, Simon Thompson > wrote: Hi Kevin, Have you looked at the rest API? https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSTXKQY_5.0.2%2Fcom.ibm.spectrum.scale.v5r02.doc%2Fbl1adm_listofapicommands.htm&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C36fb451ce9a945f5e0cb08d67683af85%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636826705300525885&sdata=uotWilntiZa2E9RIBE2ikhxxBm3Mk3y%2FW%2FKUHovaJpY%3D&reserved=0 I don't know how much access control there is available in the API so not sure if you could lock some sort of service user down to just the get filesets command? Simon _______________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Buterbaugh, Kevin L [Kevin.Buterbaugh at Vanderbilt.Edu] Sent: 08 January 2019 22:12 To: gpfsug main discussion list Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? Hi All, Happy New Year to all! Personally, I?ll gladly and gratefully settle for 2019 not being a dumpster fire like 2018 was (those who attended my talk at the user group meeting at SC18 know what I?m referring to), but I certainly wish all of you the best! Is there a way to get a list of the filesets in a filesystem without running mmlsfileset? I was kind of expecting to find them in one of the config files somewhere under /var/mmfs but haven?t found them yet in the searching I?ve done. The reason I?m asking is that we have a Python script that users can run that needs to get a list of all the filesets in a filesystem. There are obviously multiple issues with that, so the workaround we?re using for now is to have a cron job which runs mmlsfileset once a day and dumps it out to a text file, which the script then reads. That?s sub-optimal for any day on which a fileset gets created or deleted, so I?m looking for a better way ? one which doesn?t require root privileges and preferably doesn?t involve running a GPFS command at all. Thanks in advance. Kevin P.S. I am still working on metadata and iSCSI testing and will report back on that when complete. P.P.S. We ended up adding our new NSDs comprised of (not really) 12 TB disks to the capacity pool and things are working fine. ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C36fb451ce9a945f5e0cb08d67683af85%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636826705300525885&sdata=WSijRrjhOgQyuWsh9K8ckpjf%2F2CkXfZW1n%2BJw5Gw5tw%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Thu Jan 10 01:13:55 2019 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Thu, 10 Jan 2019 01:13:55 +0000 Subject: [gpfsug-discuss] Get list of filesets _without_runningmmlsfileset? In-Reply-To: References: , <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> Message-ID: An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Jan 10 20:42:50 2019 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 10 Jan 2019 20:42:50 +0000 Subject: [gpfsug-discuss] Get list of filesets _without_runningmmlsfileset? In-Reply-To: References: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> Message-ID: <6A909228-87E7-468E-A51C-086B9C75BB18@vanderbilt.edu> Hi Andrew / All, Well, it does _sound_ useful, but in its current state it?s really not for several reasons, mainly having to do with it being coded in a moderately site-specific way. It needs an overhaul anyway, so I?m going to look at getting rid of as much of that as possible (there?s some definite low-hanging fruit there) and, for the site-specific things that can?t be gotten rid of, maybe consolidating them into one place in the code so that the script could be more generally useful if you just change those values. If I can accomplish those things, then yes, we?d be glad to share the script. But I?ve also realized that I didn?t _entirely_ answer my original question. Yes, mmlsquota will show me all the filesets ? but I also need to know the junction path for each of those filesets. One of the main reasons we wrote this script in the first place is that if you run mmlsquota you see that you have no limits on about 60 filesets (currently we use fileset quotas only on our filesets) ? and that?s because there are no user (or group) quotas in those filesets. The script, however, reads in that text file that is created nightly by root that is nothing more than the output of ?mmlsfileset ?, gets the junction path, looks up the GID of the junction path, and sees if you?re a member of that group. If you?re not, well, no sense in showing you anything about that fileset. But, of course, if you are a member of that group, then we do want to show you the fileset quota for that fileset. So ? my question now is, ?Is there a way for a non-root user? to get the junction path for the fileset(s)? Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Jan 9, 2019, at 7:13 PM, Andrew Beattie > wrote: Kevin, That sounds like a useful script would you care to share? Thanks Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Buterbaugh, Kevin L" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: Re: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? Date: Thu, Jan 10, 2019 9:22 AM Hi All, Let me answer Skylar?s questions in another e-mail, which may also tell whether the rest API is a possibility or not. The Python script in question is to display quota information for a user. The mmlsquota command has a couple of issues: 1) its output is confusing to some of our users, 2) more significantly, it displays a ton of information that doesn?t apply to the user running it. For example, it will display all the filesets in a filesystem whether or not the user has access to them. So the Python script figures out what group(s) the user is a member of and only displays information pertinent to them (i.e. the group of the fileset junction path is a group this user is a member of) ? and in a simplified (and potentially colorized) output format. And typing that preceding paragraph caused the lightbulb to go off ? I know the answer to my own question ? have the script run mmlsquota and get the full list of filesets from that, then parse that to determine which ones I actually need to display quota information for. Thanks! Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Jan 9, 2019, at 4:42 PM, Simon Thompson > wrote: Hi Kevin, Have you looked at the rest API? https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSTXKQY_5.0.2%2Fcom.ibm.spectrum.scale.v5r02.doc%2Fbl1adm_listofapicommands.htm&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C36fb451ce9a945f5e0cb08d67683af85%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636826705300525885&sdata=uotWilntiZa2E9RIBE2ikhxxBm3Mk3y%2FW%2FKUHovaJpY%3D&reserved=0 I don't know how much access control there is available in the API so not sure if you could lock some sort of service user down to just the get filesets command? Simon _______________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Buterbaugh, Kevin L [Kevin.Buterbaugh at Vanderbilt.Edu] Sent: 08 January 2019 22:12 To: gpfsug main discussion list Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? Hi All, Happy New Year to all! Personally, I?ll gladly and gratefully settle for 2019 not being a dumpster fire like 2018 was (those who attended my talk at the user group meeting at SC18 know what I?m referring to), but I certainly wish all of you the best! Is there a way to get a list of the filesets in a filesystem without running mmlsfileset? I was kind of expecting to find them in one of the config files somewhere under /var/mmfs but haven?t found them yet in the searching I?ve done. The reason I?m asking is that we have a Python script that users can run that needs to get a list of all the filesets in a filesystem. There are obviously multiple issues with that, so the workaround we?re using for now is to have a cron job which runs mmlsfileset once a day and dumps it out to a text file, which the script then reads. That?s sub-optimal for any day on which a fileset gets created or deleted, so I?m looking for a better way ? one which doesn?t require root privileges and preferably doesn?t involve running a GPFS command at all. Thanks in advance. Kevin P.S. I am still working on metadata and iSCSI testing and will report back on that when complete. P.P.S. We ended up adding our new NSDs comprised of (not really) 12 TB disks to the capacity pool and things are working fine. ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C36fb451ce9a945f5e0cb08d67683af85%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636826705300525885&sdata=WSijRrjhOgQyuWsh9K8ckpjf%2F2CkXfZW1n%2BJw5Gw5tw%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cc1ffac821c5f4524104908d67698e948%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636826796467009700&sdata=Xfz4JiItI8ukHgnvO5YoN27jVpk6Ngsk03NtMrKJcHk%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.childs at qmul.ac.uk Fri Jan 11 12:50:17 2019 From: p.childs at qmul.ac.uk (Peter Childs) Date: Fri, 11 Jan 2019 12:50:17 +0000 Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? In-Reply-To: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> References: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> Message-ID: <300e6e1f8fb4daf9278f0b67262c4046127e0bbf.camel@qmul.ac.uk> We have a similar issue, I'm wondering if getting mmlsfileset to work as a user is a reasonable "request for enhancement" I suspect it would need better wording. We too have a rather complex script to report on quota's that I suspect does a similar job. It works by having all the filesets mounted in known locations and names matching mount point names. It then works out which ones are needed by looking at the group ownership, Its very slow and a little cumbersome. Not least because it was written ages ago in a mix of bash, sed, awk and find. On Tue, 2019-01-08 at 22:12 +0000, Buterbaugh, Kevin L wrote: Hi All, Happy New Year to all! Personally, I?ll gladly and gratefully settle for 2019 not being a dumpster fire like 2018 was (those who attended my talk at the user group meeting at SC18 know what I?m referring to), but I certainly wish all of you the best! Is there a way to get a list of the filesets in a filesystem without running mmlsfileset? I was kind of expecting to find them in one of the config files somewhere under /var/mmfs but haven?t found them yet in the searching I?ve done. The reason I?m asking is that we have a Python script that users can run that needs to get a list of all the filesets in a filesystem. There are obviously multiple issues with that, so the workaround we?re using for now is to have a cron job which runs mmlsfileset once a day and dumps it out to a text file, which the script then reads. That?s sub-optimal for any day on which a fileset gets created or deleted, so I?m looking for a better way ? one which doesn?t require root privileges and preferably doesn?t involve running a GPFS command at all. Thanks in advance. Kevin P.S. I am still working on metadata and iSCSI testing and will report back on that when complete. P.P.S. We ended up adding our new NSDs comprised of (not really) 12 TB disks to the capacity pool and things are working fine. ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Jan 11 14:19:28 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 11 Jan 2019 14:19:28 +0000 Subject: [gpfsug-discuss] A cautionary tale of upgrades Message-ID: <7085B14A-92B4-47DE-9411-FB544FD4A610@bham.ac.uk> I?ll start by saying this is our experience, maybe we did something stupid along the way, but just in case others see similar issues ? We have a cluster which contains protocol nodes, these were all happily running GPFS 5.0.1-2 code. But the cluster was a only 4 nodes + 1 quorum node ? manager and quorum functions were handled by the 4 protocol nodes. Then one day we needed to reboot a protocol node. We did so and its disk controller appeared to have failed. Oh well, we thought we?ll fix that another day, we still have three other quorum nodes. As they are all getting a little long in the tooth and were starting to struggle, we thought, well we have DME, lets add some new nodes for quorum and token functions. Being shiny and new they were all installed with GPFS 5.0.2-1 code. All was well. The some-time later, we needed to restart another of the CES nodes, when we started GPFS on the node, it was causing havock in our cluster ? CES IPs were constantly being assigned, then removed from the remaining nodes in the cluster. Crap we thought and disabled the node in the cluster. This made things stabilise and as we?d been having other GPFS issues, we didn?t want service to be interrupted whilst we dug into this. Besides, it was nearly Christmas and we had conferences and other work to content with. More time passes and we?re about to cut over all our backend storage to some shiny new DSS-G kit, so we plan a whole system maintenance window. We finish all our data sync?s and then try to start our protocol nodes to test them. No dice ? we can?t get any of the nodes to bring up IPs, the logs look like they start the assignment process, but then gave up. A lot of digging in the mm korn shell scripts, and some studious use of DEBUG=1 when testing, we find that mmcesnetmvaddress is calling ?tsctl shownodes up?. On our protocol nodes, we find output of the form: bear-er-dtn01.bb2.cluster.cluster,rds-aw-ctdb01-data.bb2.cluster.cluster,rds-er-ctdb01-data.bb2.cluster.cluster,bber-irods-ires01-data.bb2.cluster.cluster,bber-irods-icat01-data.bb2.cluster.cluster,bbaw-irods-icat01-data.bb2.cluster.cluster,proto-pg-mgr01.bear.cluster.cluster,proto-pg-pf01.bear.cluster.cluster,proto-pg-dtn01.bear.cluster.cluster,proto-er-mgr01.bear.cluster.cluster,proto-er-pf01.bear.cluster.cluster,proto-aw-mgr01.bear.cluster.cluster,proto-aw-pf01.bear.cluster.cluster Now our DNS name for these nodes is bb2.cluster ? something is repeating the DNS name. So we dig around, resolv.conf, /etc/hosts etc all look good and name resolution seems fine. We look around on the manager/quorum nodes and they don?t do this cluster.cluster thing. We can?t find anything else Linux config wise that looks bad. In fact the only difference is that our CES nodes are running 5.0.1-2 and the manager nodes 5.0.2-1. Given we?re changing the whole storage hardware, we didn?t want to change the GPFS/NFS/SMB code on the CES nodes, (we?ve been bitten before with SMB packages not working properly in our environment), but we go ahead and do GPFS and NFS packages. Suddenly, magically all is working again. CES starts fine and IPs get assigned OK. And tsctl gives the correct output. So, my supposition is that there is some incompatibility between 5.0.1-2 and 5.0.2-1 when running CES and the cluster manager is running on 5.0.2-1. As I said before, I don?t have hard evidence we did something stupid, but it certainly is fishy. We?re guessing this same ?feature? was the cause of the CES issues we saw when we rebooted a CES node and the IPs kept deassigning? It looks like all was well as we added the manager nodes after CES was started, but when a CES node restarted, things broke. We got everything working again in house so didn?t raise a PMR, but if you find yourself in this upgrade path, beware! Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From MDIETZ at de.ibm.com Fri Jan 11 14:58:20 2019 From: MDIETZ at de.ibm.com (Mathias Dietz) Date: Fri, 11 Jan 2019 15:58:20 +0100 Subject: [gpfsug-discuss] A cautionary tale of upgrades In-Reply-To: <7085B14A-92B4-47DE-9411-FB544FD4A610@bham.ac.uk> References: <7085B14A-92B4-47DE-9411-FB544FD4A610@bham.ac.uk> Message-ID: Hi Simon, you likely run into the following issue: APAR IV93896 - https://www-01.ibm.com/support/docview.wss?uid=isg1IV93896 This problem happens only if you use different host domains within a cluster and will mostly impact CES. It is unrelated to upgrade or mixed version clusters. Its has been fixed with 5.0.2, therefore I recommend to upgrade soon. Mit freundlichen Gr??en / Kind regards Mathias Dietz Spectrum Scale Development - Release Lead Architect (4.2.x) Spectrum Scale RAS Architect --------------------------------------------------------------------------- IBM Deutschland Am Weiher 24 65451 Kelsterbach Phone: +49 70342744105 Mobile: +49-15152801035 E-Mail: mdietz at de.ibm.com ----------------------------------------------------------------------------- IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martina Koederitz, Gesch?ftsf?hrung: Dirk WittkoppSitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 11/01/2019 15:19 Subject: [gpfsug-discuss] A cautionary tale of upgrades Sent by: gpfsug-discuss-bounces at spectrumscale.org I?ll start by saying this is our experience, maybe we did something stupid along the way, but just in case others see similar issues ? We have a cluster which contains protocol nodes, these were all happily running GPFS 5.0.1-2 code. But the cluster was a only 4 nodes + 1 quorum node ? manager and quorum functions were handled by the 4 protocol nodes. Then one day we needed to reboot a protocol node. We did so and its disk controller appeared to have failed. Oh well, we thought we?ll fix that another day, we still have three other quorum nodes. As they are all getting a little long in the tooth and were starting to struggle, we thought, well we have DME, lets add some new nodes for quorum and token functions. Being shiny and new they were all installed with GPFS 5.0.2-1 code. All was well. The some-time later, we needed to restart another of the CES nodes, when we started GPFS on the node, it was causing havock in our cluster ? CES IPs were constantly being assigned, then removed from the remaining nodes in the cluster. Crap we thought and disabled the node in the cluster. This made things stabilise and as we?d been having other GPFS issues, we didn?t want service to be interrupted whilst we dug into this. Besides, it was nearly Christmas and we had conferences and other work to content with. More time passes and we?re about to cut over all our backend storage to some shiny new DSS-G kit, so we plan a whole system maintenance window. We finish all our data sync?s and then try to start our protocol nodes to test them. No dice ? we can?t get any of the nodes to bring up IPs, the logs look like they start the assignment process, but then gave up. A lot of digging in the mm korn shell scripts, and some studious use of DEBUG=1 when testing, we find that mmcesnetmvaddress is calling ?tsctl shownodes up?. On our protocol nodes, we find output of the form: bear-er-dtn01.bb2.cluster.cluster,rds-aw-ctdb01-data.bb2.cluster.cluster,rds-er-ctdb01-data.bb2.cluster.cluster,bber-irods-ires01-data.bb2.cluster.cluster,bber-irods-icat01-data.bb2.cluster.cluster,bbaw-irods-icat01-data.bb2.cluster.cluster,proto-pg-mgr01.bear.cluster.cluster,proto-pg-pf01.bear.cluster.cluster,proto-pg-dtn01.bear.cluster.cluster,proto-er-mgr01.bear.cluster.cluster,proto-er-pf01.bear.cluster.cluster,proto-aw-mgr01.bear.cluster.cluster,proto-aw-pf01.bear.cluster.cluster Now our DNS name for these nodes is bb2.cluster ? something is repeating the DNS name. So we dig around, resolv.conf, /etc/hosts etc all look good and name resolution seems fine. We look around on the manager/quorum nodes and they don?t do this cluster.cluster thing. We can?t find anything else Linux config wise that looks bad. In fact the only difference is that our CES nodes are running 5.0.1-2 and the manager nodes 5.0.2-1. Given we?re changing the whole storage hardware, we didn?t want to change the GPFS/NFS/SMB code on the CES nodes, (we?ve been bitten before with SMB packages not working properly in our environment), but we go ahead and do GPFS and NFS packages. Suddenly, magically all is working again. CES starts fine and IPs get assigned OK. And tsctl gives the correct output. So, my supposition is that there is some incompatibility between 5.0.1-2 and 5.0.2-1 when running CES and the cluster manager is running on 5.0.2-1. As I said before, I don?t have hard evidence we did something stupid, but it certainly is fishy. We?re guessing this same ?feature? was the cause of the CES issues we saw when we rebooted a CES node and the IPs kept deassigning? It looks like all was well as we added the manager nodes after CES was started, but when a CES node restarted, things broke. We got everything working again in house so didn?t raise a PMR, but if you find yourself in this upgrade path, beware! Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Fri Jan 11 15:00:51 2019 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Fri, 11 Jan 2019 15:00:51 +0000 Subject: [gpfsug-discuss] A cautionary tale of upgrades In-Reply-To: <7085B14A-92B4-47DE-9411-FB544FD4A610@bham.ac.uk> References: <7085B14A-92B4-47DE-9411-FB544FD4A610@bham.ac.uk> Message-ID: Hallo Simon, Welcome to the Club. These behavior are a Bug in tsctl to change the DNS names . We had this already 4 weeks ago. The fix was Update to 5.0.2.1. Regards Renar Von meinem iPhone gesendet Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Am 11.01.2019 um 15:19 schrieb Simon Thompson >: I?ll start by saying this is our experience, maybe we did something stupid along the way, but just in case others see similar issues ? We have a cluster which contains protocol nodes, these were all happily running GPFS 5.0.1-2 code. But the cluster was a only 4 nodes + 1 quorum node ? manager and quorum functions were handled by the 4 protocol nodes. Then one day we needed to reboot a protocol node. We did so and its disk controller appeared to have failed. Oh well, we thought we?ll fix that another day, we still have three other quorum nodes. As they are all getting a little long in the tooth and were starting to struggle, we thought, well we have DME, lets add some new nodes for quorum and token functions. Being shiny and new they were all installed with GPFS 5.0.2-1 code. All was well. The some-time later, we needed to restart another of the CES nodes, when we started GPFS on the node, it was causing havock in our cluster ? CES IPs were constantly being assigned, then removed from the remaining nodes in the cluster. Crap we thought and disabled the node in the cluster. This made things stabilise and as we?d been having other GPFS issues, we didn?t want service to be interrupted whilst we dug into this. Besides, it was nearly Christmas and we had conferences and other work to content with. More time passes and we?re about to cut over all our backend storage to some shiny new DSS-G kit, so we plan a whole system maintenance window. We finish all our data sync?s and then try to start our protocol nodes to test them. No dice ? we can?t get any of the nodes to bring up IPs, the logs look like they start the assignment process, but then gave up. A lot of digging in the mm korn shell scripts, and some studious use of DEBUG=1 when testing, we find that mmcesnetmvaddress is calling ?tsctl shownodes up?. On our protocol nodes, we find output of the form: bear-er-dtn01.bb2.cluster.cluster,rds-aw-ctdb01-data.bb2.cluster.cluster,rds-er-ctdb01-data.bb2.cluster.cluster,bber-irods-ires01-data.bb2.cluster.cluster,bber-irods-icat01-data.bb2.cluster.cluster,bbaw-irods-icat01-data.bb2.cluster.cluster,proto-pg-mgr01.bear.cluster.cluster,proto-pg-pf01.bear.cluster.cluster,proto-pg-dtn01.bear.cluster.cluster,proto-er-mgr01.bear.cluster.cluster,proto-er-pf01.bear.cluster.cluster,proto-aw-mgr01.bear.cluster.cluster,proto-aw-pf01.bear.cluster.cluster Now our DNS name for these nodes is bb2.cluster ? something is repeating the DNS name. So we dig around, resolv.conf, /etc/hosts etc all look good and name resolution seems fine. We look around on the manager/quorum nodes and they don?t do this cluster.cluster thing. We can?t find anything else Linux config wise that looks bad. In fact the only difference is that our CES nodes are running 5.0.1-2 and the manager nodes 5.0.2-1. Given we?re changing the whole storage hardware, we didn?t want to change the GPFS/NFS/SMB code on the CES nodes, (we?ve been bitten before with SMB packages not working properly in our environment), but we go ahead and do GPFS and NFS packages. Suddenly, magically all is working again. CES starts fine and IPs get assigned OK. And tsctl gives the correct output. So, my supposition is that there is some incompatibility between 5.0.1-2 and 5.0.2-1 when running CES and the cluster manager is running on 5.0.2-1. As I said before, I don?t have hard evidence we did something stupid, but it certainly is fishy. We?re guessing this same ?feature? was the cause of the CES issues we saw when we rebooted a CES node and the IPs kept deassigning? It looks like all was well as we added the manager nodes after CES was started, but when a CES node restarted, things broke. We got everything working again in house so didn?t raise a PMR, but if you find yourself in this upgrade path, beware! Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Jan 11 15:48:50 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 11 Jan 2019 15:48:50 +0000 Subject: [gpfsug-discuss] A cautionary tale of upgrades In-Reply-To: References: <7085B14A-92B4-47DE-9411-FB544FD4A610@bham.ac.uk>, Message-ID: Could well be. Still it's pretty scary that this sort of thing could hit you way after the different DNS name nodes were added. It might be months before you restart the CES nodes. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of MDIETZ at de.ibm.com [MDIETZ at de.ibm.com] Sent: 11 January 2019 14:58 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] A cautionary tale of upgrades Hi Simon, you likely run into the following issue: APAR IV93896 - https://www-01.ibm.com/support/docview.wss?uid=isg1IV93896 This problem happens only if you use different host domains within a cluster and will mostly impact CES. It is unrelated to upgrade or mixed version clusters. Its has been fixed with 5.0.2, therefore I recommend to upgrade soon. Mit freundlichen Gr??en / Kind regards Mathias Dietz Spectrum Scale Development - Release Lead Architect (4.2.x) Spectrum Scale RAS Architect --------------------------------------------------------------------------- IBM Deutschland Am Weiher 24 65451 Kelsterbach Phone: +49 70342744105 Mobile: +49-15152801035 E-Mail: mdietz at de.ibm.com ----------------------------------------------------------------------------- IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martina Koederitz, Gesch?ftsf?hrung: Dirk WittkoppSitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 11/01/2019 15:19 Subject: [gpfsug-discuss] A cautionary tale of upgrades Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I?ll start by saying this is our experience, maybe we did something stupid along the way, but just in case others see similar issues ? We have a cluster which contains protocol nodes, these were all happily running GPFS 5.0.1-2 code. But the cluster was a only 4 nodes + 1 quorum node ? manager and quorum functions were handled by the 4 protocol nodes. Then one day we needed to reboot a protocol node. We did so and its disk controller appeared to have failed. Oh well, we thought we?ll fix that another day, we still have three other quorum nodes. As they are all getting a little long in the tooth and were starting to struggle, we thought, well we have DME, lets add some new nodes for quorum and token functions. Being shiny and new they were all installed with GPFS 5.0.2-1 code. All was well. The some-time later, we needed to restart another of the CES nodes, when we started GPFS on the node, it was causing havock in our cluster ? CES IPs were constantly being assigned, then removed from the remaining nodes in the cluster. Crap we thought and disabled the node in the cluster. This made things stabilise and as we?d been having other GPFS issues, we didn?t want service to be interrupted whilst we dug into this. Besides, it was nearly Christmas and we had conferences and other work to content with. More time passes and we?re about to cut over all our backend storage to some shiny new DSS-G kit, so we plan a whole system maintenance window. We finish all our data sync?s and then try to start our protocol nodes to test them. No dice ? we can?t get any of the nodes to bring up IPs, the logs look like they start the assignment process, but then gave up. A lot of digging in the mm korn shell scripts, and some studious use of DEBUG=1 when testing, we find that mmcesnetmvaddress is calling ?tsctl shownodes up?. On our protocol nodes, we find output of the form: bear-er-dtn01.bb2.cluster.cluster,rds-aw-ctdb01-data.bb2.cluster.cluster,rds-er-ctdb01-data.bb2.cluster.cluster,bber-irods-ires01-data.bb2.cluster.cluster,bber-irods-icat01-data.bb2.cluster.cluster,bbaw-irods-icat01-data.bb2.cluster.cluster,proto-pg-mgr01.bear.cluster.cluster,proto-pg-pf01.bear.cluster.cluster,proto-pg-dtn01.bear.cluster.cluster,proto-er-mgr01.bear.cluster.cluster,proto-er-pf01.bear.cluster.cluster,proto-aw-mgr01.bear.cluster.cluster,proto-aw-pf01.bear.cluster.cluster Now our DNS name for these nodes is bb2.cluster ? something is repeating the DNS name. So we dig around, resolv.conf, /etc/hosts etc all look good and name resolution seems fine. We look around on the manager/quorum nodes and they don?t do this cluster.cluster thing. We can?t find anything else Linux config wise that looks bad. In fact the only difference is that our CES nodes are running 5.0.1-2 and the manager nodes 5.0.2-1. Given we?re changing the whole storage hardware, we didn?t want to change the GPFS/NFS/SMB code on the CES nodes, (we?ve been bitten before with SMB packages not working properly in our environment), but we go ahead and do GPFS and NFS packages. Suddenly, magically all is working again. CES starts fine and IPs get assigned OK. And tsctl gives the correct output. So, my supposition is that there is some incompatibility between 5.0.1-2 and 5.0.2-1 when running CES and the cluster manager is running on 5.0.2-1. As I said before, I don?t have hard evidence we did something stupid, but it certainly is fishy. We?re guessing this same ?feature? was the cause of the CES issues we saw when we rebooted a CES node and the IPs kept deassigning? It looks like all was well as we added the manager nodes after CES was started, but when a CES node restarted, things broke. We got everything working again in house so didn?t raise a PMR, but if you find yourself in this upgrade path, beware! Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From makaplan at us.ibm.com Fri Jan 11 17:31:35 2019 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Fri, 11 Jan 2019 14:31:35 -0300 Subject: [gpfsug-discuss] Get list offilesets_without_runningmmlsfileset? In-Reply-To: <6A909228-87E7-468E-A51C-086B9C75BB18@vanderbilt.edu> References: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> <6A909228-87E7-468E-A51C-086B9C75BB18@vanderbilt.edu> Message-ID: ?Is there a way for a non-root user? to get the junction path for the fileset(s)? Presuming the user has some path to some file in the fileset... Issue `mmlsattr -L path` then "walk" back towards the root by discarding successive path suffixes and watch for changes in the fileset name field Why doesn't mmlsfileset work for non-root users? I don't know. Perhaps the argument has to do with security or confidentiality. On my test system it gives a bogus error, when it should say something about root or super-user. -------------- next part -------------- An HTML attachment was scrubbed... URL: From JRLang at uwyo.edu Fri Jan 11 16:24:17 2019 From: JRLang at uwyo.edu (Jeffrey R. Lang) Date: Fri, 11 Jan 2019 16:24:17 +0000 Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? In-Reply-To: <300e6e1f8fb4daf9278f0b67262c4046127e0bbf.camel@qmul.ac.uk> References: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> <300e6e1f8fb4daf9278f0b67262c4046127e0bbf.camel@qmul.ac.uk> Message-ID: What we do is the use ?mmlsquota -Y ? which will list out all the filesets in an easily parseable format. And the command can be run by the user. From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Peter Childs Sent: Friday, January 11, 2019 6:50 AM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? ? This message was sent from a non-UWYO address. Please exercise caution when clicking links or opening attachments from external sources. We have a similar issue, I'm wondering if getting mmlsfileset to work as a user is a reasonable "request for enhancement" I suspect it would need better wording. We too have a rather complex script to report on quota's that I suspect does a similar job. It works by having all the filesets mounted in known locations and names matching mount point names. It then works out which ones are needed by looking at the group ownership, Its very slow and a little cumbersome. Not least because it was written ages ago in a mix of bash, sed, awk and find. On Tue, 2019-01-08 at 22:12 +0000, Buterbaugh, Kevin L wrote: Hi All, Happy New Year to all! Personally, I?ll gladly and gratefully settle for 2019 not being a dumpster fire like 2018 was (those who attended my talk at the user group meeting at SC18 know what I?m referring to), but I certainly wish all of you the best! Is there a way to get a list of the filesets in a filesystem without running mmlsfileset? I was kind of expecting to find them in one of the config files somewhere under /var/mmfs but haven?t found them yet in the searching I?ve done. The reason I?m asking is that we have a Python script that users can run that needs to get a list of all the filesets in a filesystem. There are obviously multiple issues with that, so the workaround we?re using for now is to have a cron job which runs mmlsfileset once a day and dumps it out to a text file, which the script then reads. That?s sub-optimal for any day on which a fileset gets created or deleted, so I?m looking for a better way ? one which doesn?t require root privileges and preferably doesn?t involve running a GPFS command at all. Thanks in advance. Kevin P.S. I am still working on metadata and iSCSI testing and will report back on that when complete. P.P.S. We ended up adding our new NSDs comprised of (not really) 12 TB disks to the capacity pool and things are working fine. ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Sat Jan 12 03:07:29 2019 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Sat, 12 Jan 2019 03:07:29 +0000 Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? In-Reply-To: References: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> <300e6e1f8fb4daf9278f0b67262c4046127e0bbf.camel@qmul.ac.uk> Message-ID: <1CD7EBDE-F39D-4410-9028-EF9FBF22C6EC@vanderbilt.edu> Hi All, I appreciate the time several of you have taken to respond to my inquiry. However, unless I?m missing something - and my apologies if I am - none so far appear to allow me to obtain the list of junction paths as a non-root user. Yes, mmlsquota shows all the filesets. But from there I need to then be able to find out where that fileset is mounted in the directory tree so that I can see who the owner and group of that directory are. Only if the user running the script is either the owner or a member of the group do I want to display the fileset quota for that fileset to the user. Thanks again? Kevin On Jan 11, 2019, at 10:24 AM, Jeffrey R. Lang > wrote: What we do is the use ?mmlsquota -Y ? which will list out all the filesets in an easily parseable format. And the command can be run by the user. From: gpfsug-discuss-bounces at spectrumscale.org > On Behalf Of Peter Childs Sent: Friday, January 11, 2019 6:50 AM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? ? This message was sent from a non-UWYO address. Please exercise caution when clicking links or opening attachments from external sources. We have a similar issue, I'm wondering if getting mmlsfileset to work as a user is a reasonable "request for enhancement" I suspect it would need better wording. We too have a rather complex script to report on quota's that I suspect does a similar job. It works by having all the filesets mounted in known locations and names matching mount point names. It then works out which ones are needed by looking at the group ownership, Its very slow and a little cumbersome. Not least because it was written ages ago in a mix of bash, sed, awk and find. On Tue, 2019-01-08 at 22:12 +0000, Buterbaugh, Kevin L wrote: Hi All, Happy New Year to all! Personally, I?ll gladly and gratefully settle for 2019 not being a dumpster fire like 2018 was (those who attended my talk at the user group meeting at SC18 know what I?m referring to), but I certainly wish all of you the best! Is there a way to get a list of the filesets in a filesystem without running mmlsfileset? I was kind of expecting to find them in one of the config files somewhere under /var/mmfs but haven?t found them yet in the searching I?ve done. The reason I?m asking is that we have a Python script that users can run that needs to get a list of all the filesets in a filesystem. There are obviously multiple issues with that, so the workaround we?re using for now is to have a cron job which runs mmlsfileset once a day and dumps it out to a text file, which the script then reads. That?s sub-optimal for any day on which a fileset gets created or deleted, so I?m looking for a better way ? one which doesn?t require root privileges and preferably doesn?t involve running a GPFS command at all. Thanks in advance. Kevin P.S. I am still working on metadata and iSCSI testing and will report back on that when complete. P.P.S. We ended up adding our new NSDs comprised of (not really) 12 TB disks to the capacity pool and things are working fine. ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cee10c1e22a474fedceb408d678318231%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636828551398056004&sdata=F56RKhMef0zYjAj2dKFu3bAuq7xQvFoulYhwDnfN1Ms%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Sat Jan 12 20:42:42 2019 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Sat, 12 Jan 2019 15:42:42 -0500 Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? In-Reply-To: <1CD7EBDE-F39D-4410-9028-EF9FBF22C6EC@vanderbilt.edu> References: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> <300e6e1f8fb4daf9278f0b67262c4046127e0bbf.camel@qmul.ac.uk> <1CD7EBDE-F39D-4410-9028-EF9FBF22C6EC@vanderbilt.edu> Message-ID: <13713.1547325762@turing-police.cc.vt.edu> On Sat, 12 Jan 2019 03:07:29 +0000, "Buterbaugh, Kevin L" said: > But from there I need to then be able to find out where that fileset is > mounted in the directory tree so that I can see who the owner and group of that > directory are. You're not able to leverage a local naming scheme? There's no connection between the name of the fileset and where it is in the tree? I would hope there is, because otherwise when your tool says 'Fileset ag5eg19 is over quota', your poor user will now be confused over what director(y/ies) need to be cleaned up. If your tool says 'Fileset foo_bar_baz is over quota' and the user knows that's mounted at /gpfs/foo/bar/baz then it's actionable. And if the user knows what the mapping is, your script can know it too.... From scottg at emailhosting.com Mon Jan 14 04:09:57 2019 From: scottg at emailhosting.com (Scott Goldman) Date: Sun, 13 Jan 2019 23:09:57 -0500 Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? In-Reply-To: <13713.1547325762@turing-police.cc.vt.edu> Message-ID: Kevin, Something I've done in the past is to create a service that once an hour/day/week that would build a static file that consists of the needed output. As long as you can take the update delay (or perhaps trigger the update with a callback), this should work and could actually be lighter on the system. Sent from my BlackBerry - the most secure mobile device ? Original Message ? From: valdis.kletnieks at vt.edu Sent: January 12, 2019 4:07 PM To: gpfsug-discuss at spectrumscale.org Reply-to: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? On Sat, 12 Jan 2019 03:07:29 +0000, "Buterbaugh, Kevin L" said: > But from there I need to then be able to find out where that fileset is > mounted in the directory tree so that I can see who the owner and group of that > directory are. You're not able to leverage a local naming scheme? There's no connection between the name of the fileset and where it is in the tree?? I would hope there is, because otherwise when your tool says 'Fileset ag5eg19 is over quota', your poor user will now be confused over what director(y/ies) need to be cleaned up.? If your tool says 'Fileset foo_bar_baz is over quota' and the user knows that's mounted at /gpfs/foo/bar/baz then it's actionable. And if the user knows what the mapping is, your script can know it too.... _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From olaf.weiser at de.ibm.com Mon Jan 14 06:31:28 2019 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Mon, 14 Jan 2019 07:31:28 +0100 Subject: [gpfsug-discuss] A cautionary tale of upgrades In-Reply-To: <7085B14A-92B4-47DE-9411-FB544FD4A610@bham.ac.uk> References: <7085B14A-92B4-47DE-9411-FB544FD4A610@bham.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: From sandeep.patil at in.ibm.com Mon Jan 14 12:54:29 2019 From: sandeep.patil at in.ibm.com (Sandeep Ramesh) Date: Mon, 14 Jan 2019 12:54:29 +0000 Subject: [gpfsug-discuss] Latest Technical Blogs on IBM Spectrum Scale (Q4 2018) In-Reply-To: References: Message-ID: Dear User Group Members, In continuation, here are list of development blogs in the this quarter (Q4 2018). We now have over 100+ developer blogs on Spectrum Scale/ESS. As discussed in User Groups, passing it along to the emailing list. Redpaper: IBM Spectrum Scale and IBM StoredIQ: Identifying and securing your business data to support regulatory requirements http://www.redbooks.ibm.com/abstracts/redp5525.html?Open IBM Spectrum Scale Memory Usage https://www.slideshare.net/tomerperry/ibm-spectrum-scale-memory-usage?qid=50a1dfda-3102-484f-b9d0-14b69fc4800b&v=&b=&from_search=2 Spectrum Scale and Containers https://developer.ibm.com/storage/2018/12/20/spectrum-scale-and-containers/ IBM Elastic Storage Server Performance Graphical Visualization with Grafana https://developer.ibm.com/storage/2018/12/18/ibm-elastic-storage-server-performance-graphical-visualization-with-grafana/ Hadoop Performance for disaggregated compute and storage configurations based on IBM Spectrum Scale Storage https://developer.ibm.com/storage/2018/12/13/hadoop-performance-for-disaggregated-compute-and-storage-configurations-based-on-ibm-spectrum-scale-storage/ EMS HA in ESS LE (Little Endian) environment https://developer.ibm.com/storage/2018/12/07/ems-ha-in-ess-le-little-endian-environment/ What?s new in ESS 5.3.2 https://developer.ibm.com/storage/2018/12/04/whats-new-in-ess-5-3-2/ Administer your Spectrum Scale cluster easily https://developer.ibm.com/storage/2018/11/13/administer-your-spectrum-scale-cluster-easily/ Disaster Recovery using Spectrum Scale?s Active File Management https://developer.ibm.com/storage/2018/11/13/disaster-recovery-using-spectrum-scales-active-file-management/ Recovery Group Failover Procedure of IBM Elastic Storage Server (ESS) https://developer.ibm.com/storage/2018/10/08/recovery-group-failover-procedure-ibm-elastic-storage-server-ess/ Whats new in IBM Elastic Storage Server (ESS) Version 5.3.1 and 5.3.1.1 https://developer.ibm.com/storage/2018/10/04/whats-new-ibm-elastic-storage-server-ess-version-5-3-1-5-3-1-1/ For more : Search /browse here: https://developer.ibm.com/storage/blog User Group Presentations: https://www.spectrumscale.org/presentations/ Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Blogs%2C%20White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Date: 10/03/2018 08:48 PM Subject: Latest Technical Blogs on IBM Spectrum Scale (Q3 2018) Dear User Group Members, In continuation, here are list of development blogs in the this quarter (Q3 2018). We now have over 100+ developer blogs on Spectrum Scale/ESS. As discussed in User Groups, passing it along to the emailing list. How NFS exports became more dynamic with Spectrum Scale 5.0.2 https://developer.ibm.com/storage/2018/10/02/nfs-exports-became-dynamic-spectrum-scale-5-0-2/ HPC storage on AWS (IBM Spectrum Scale) https://developer.ibm.com/storage/2018/10/02/hpc-storage-aws-ibm-spectrum-scale/ Upgrade with Excluding the node(s) using Install-toolkit https://developer.ibm.com/storage/2018/09/30/upgrade-excluding-nodes-using-install-toolkit/ Offline upgrade using Install-toolkit https://developer.ibm.com/storage/2018/09/30/offline-upgrade-using-install-toolkit/ IBM Spectrum Scale for Linux on IBM Z ? What?s new in IBM Spectrum Scale 5.0.2 ? https://developer.ibm.com/storage/2018/09/21/ibm-spectrum-scale-for-linux-on-ibm-z-whats-new-in-ibm-spectrum-scale-5-0-2/ What?s New in IBM Spectrum Scale 5.0.2 ? https://developer.ibm.com/storage/2018/09/15/whats-new-ibm-spectrum-scale-5-0-2/ Starting IBM Spectrum Scale 5.0.2 release, the installation toolkit supports upgrade rerun if fresh upgrade fails. https://developer.ibm.com/storage/2018/09/15/starting-ibm-spectrum-scale-5-0-2-release-installation-toolkit-supports-upgrade-rerun-fresh-upgrade-fails/ IBM Spectrum Scale installation toolkit ? enhancements over releases ? 5.0.2.0 https://developer.ibm.com/storage/2018/09/15/ibm-spectrum-scale-installation-toolkit-enhancements-releases-5-0-2-0/ Announcing HDP 3.0 support with IBM Spectrum Scale https://developer.ibm.com/storage/2018/08/31/announcing-hdp-3-0-support-ibm-spectrum-scale/ IBM Spectrum Scale Tuning Overview for Hadoop Workload https://developer.ibm.com/storage/2018/08/20/ibm-spectrum-scale-tuning-overview-hadoop-workload/ Making the Most of Multicloud Storage https://developer.ibm.com/storage/2018/08/13/making-multicloud-storage/ Disaster Recovery for Transparent Cloud Tiering using SOBAR https://developer.ibm.com/storage/2018/08/13/disaster-recovery-transparent-cloud-tiering-using-sobar/ Your Optimal Choice of AI Storage for Today and Tomorrow https://developer.ibm.com/storage/2018/08/10/spectrum-scale-ai-workloads/ Analyze IBM Spectrum Scale File Access Audit with ELK Stack https://developer.ibm.com/storage/2018/07/30/analyze-ibm-spectrum-scale-file-access-audit-elk-stack/ Mellanox SX1710 40G switch MLAG configuration for IBM ESS https://developer.ibm.com/storage/2018/07/12/mellanox-sx1710-40g-switcher-mlag-configuration/ Protocol Problem Determination Guide for IBM Spectrum Scale? ? SMB and NFS Access issues https://developer.ibm.com/storage/2018/07/10/protocol-problem-determination-guide-ibm-spectrum-scale-smb-nfs-access-issues/ Access Control in IBM Spectrum Scale Object https://developer.ibm.com/storage/2018/07/06/access-control-ibm-spectrum-scale-object/ IBM Spectrum Scale HDFS Transparency Docker support https://developer.ibm.com/storage/2018/07/06/ibm-spectrum-scale-hdfs-transparency-docker-support/ Protocol Problem Determination Guide for IBM Spectrum Scale? ? Log Collection https://developer.ibm.com/storage/2018/07/04/protocol-problem-determination-guide-ibm-spectrum-scale-log-collection/ Redpapers IBM Spectrum Scale Immutability Introduction, Configuration Guidance, and Use Cases http://www.redbooks.ibm.com/abstracts/redp5507.html?Open Certifications Assessment of the immutability function of IBM Spectrum Scale Version 5.0 in accordance to US SEC17a-4f, EU GDPR Article 21 Section 1, German and Swiss laws and regulations in collaboration with KPMG. Certificate: http://www.kpmg.de/bescheinigungen/RequestReport.aspx?DE968667B47544FF83F6CCDCF37E5FB5 Full assessment report: http://www.kpmg.de/bescheinigungen/RequestReport.aspx?B290411BE1224F5A9B4D24663BCD3C5D For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Date: 07/03/2018 12:13 AM Subject: Re: Latest Technical Blogs on Spectrum Scale (Q2 2018) Dear User Group Members, In continuation , here are list of development blogs in the this quarter (Q2 2018). We now have over 100+ developer blogs. As discussed in User Groups, passing it along: IBM Spectrum Scale 5.0.1 ? Whats new in Unified File and Object https://developer.ibm.com/storage/2018/06/15/6494/ IBM Spectrum Scale ILM Policies https://developer.ibm.com/storage/2018/06/02/ibm-spectrum-scale-ilm-policies/ IBM Spectrum Scale 5.0.1 ? Whats new in Unified File and Object https://developer.ibm.com/storage/2018/06/15/6494/ Management GUI enhancements in IBM Spectrum Scale release 5.0.1 https://developer.ibm.com/storage/2018/05/18/management-gui-enhancements-in-ibm-spectrum-scale-release-5-0-1/ Managing IBM Spectrum Scale services through GUI https://developer.ibm.com/storage/2018/05/18/managing-ibm-spectrum-scale-services-through-gui/ Use AWS CLI with IBM Spectrum Scale? object storage https://developer.ibm.com/storage/2018/05/16/use-awscli-with-ibm-spectrum-scale-object-storage/ Hadoop Storage Tiering with IBM Spectrum Scale https://developer.ibm.com/storage/2018/05/09/hadoop-storage-tiering-ibm-spectrum-scale/ How many Files on my Filesystem? https://developer.ibm.com/storage/2018/05/07/many-files-filesystem/ Recording Spectrum Scale Object Stats for Potential Billing like Purpose using Elasticsearch https://developer.ibm.com/storage/2018/05/04/spectrum-scale-object-stats-for-billing-using-elasticsearch/ New features in IBM Elastic Storage Server (ESS) Version 5.3 https://developer.ibm.com/storage/2018/04/09/new-features-ibm-elastic-storage-server-ess-version-5-3/ Using IBM Spectrum Scale for storage in IBM Cloud Private (Missed to send earlier) https://medium.com/ibm-cloud/ibm-spectrum-scale-with-ibm-cloud-private-8bf801796f19 Redpapers Hortonworks Data Platform with IBM Spectrum Scale: Reference Guide for Building an Integrated Solution http://www.redbooks.ibm.com/redpieces/abstracts/redp5448.html, Enabling Hybrid Cloud Storage for IBM Spectrum Scale Using Transparent Cloud Tiering http://www.redbooks.ibm.com/abstracts/redp5411.html?Open SAP HANA and ESS: A Winning Combination (Update) http://www.redbooks.ibm.com/abstracts/redp5436.html?Open Others IBM Spectrum Scale Software Version Recommendation Preventive Service Planning (Updated) http://www-01.ibm.com/support/docview.wss?uid=ssg1S1009703, IDC Infobrief: A Modular Approach to Genomics Infrastructure at Scale in HCLS https://www.ibm.com/common/ssi/cgi-bin/ssialias?htmlfid=37016937USEN& For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Date: 03/27/2018 05:23 PM Subject: Re: Latest Technical Blogs on Spectrum Scale Dear User Group Members, In continuation , here are list of development blogs in the this quarter (Q1 2018). As discussed in User Groups, passing it along: GDPR Compliance and Unstructured Data Storage https://developer.ibm.com/storage/2018/03/27/gdpr-compliance-unstructure-data-storage/ IBM Spectrum Scale for Linux on IBM Z ? Release 5.0 features and highlights https://developer.ibm.com/storage/2018/03/09/ibm-spectrum-scale-linux-ibm-z-release-5-0-features-highlights/ Management GUI enhancements in IBM Spectrum Scale release 5.0.0 https://developer.ibm.com/storage/2018/01/18/gui-enhancements-in-spectrum-scale-release-5-0-0/ IBM Spectrum Scale 5.0.0 ? What?s new in NFS? https://developer.ibm.com/storage/2018/01/18/ibm-spectrum-scale-5-0-0-whats-new-nfs/ Benefits and implementation of Spectrum Scale sudo wrappers https://developer.ibm.com/storage/2018/01/15/benefits-implementation-spectrum-scale-sudo-wrappers/ IBM Spectrum Scale: Big Data and Analytics Solution Brief https://developer.ibm.com/storage/2018/01/15/ibm-spectrum-scale-big-data-analytics-solution-brief/ Variant Sub-blocks in Spectrum Scale 5.0 https://developer.ibm.com/storage/2018/01/11/spectrum-scale-variant-sub-blocks/ Compression support in Spectrum Scale 5.0.0 https://developer.ibm.com/storage/2018/01/11/compression-support-spectrum-scale-5-0-0/ IBM Spectrum Scale Versus Apache Hadoop HDFS https://developer.ibm.com/storage/2018/01/10/spectrumscale_vs_hdfs/ ESS Fault Tolerance https://developer.ibm.com/storage/2018/01/09/ess-fault-tolerance/ Genomic Workloads ? How To Get it Right From Infrastructure Point Of View. https://developer.ibm.com/storage/2018/01/06/genomic-workloads-get-right-infrastructure-point-view/ IBM Spectrum Scale On AWS Cloud : This video explains how to deploy IBM Spectrum Scale on AWS. This solution helps the users who require highly available access to a shared name space across multiple instances with good performance, without requiring an in-depth knowledge of IBM Spectrum Scale. Detailed Demo : https://www.youtube.com/watch?v=6j5Xj_d0bh4 Brief Demo : https://www.youtube.com/watch?v=-aMQKPW_RfY. For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Cc: Doris Conti/Poughkeepsie/IBM at IBMUS Date: 01/10/2018 12:13 PM Subject: Re: Latest Technical Blogs on Spectrum Scale Dear User Group Members, Here are list of development blogs in the last quarter. Passing it to this email group as Doris had got a feedback in the UG meetings to notify the members with the latest updates periodically. Genomic Workloads ? How To Get it Right From Infrastructure Point Of View. https://developer.ibm.com/storage/2018/01/06/genomic-workloads-get-right-infrastructure-point-view/ IBM Spectrum Scale Versus Apache Hadoop HDFS https://developer.ibm.com/storage/2018/01/10/spectrumscale_vs_hdfs/ ESS Fault Tolerance https://developer.ibm.com/storage/2018/01/09/ess-fault-tolerance/ IBM Spectrum Scale MMFSCK ? Savvy Enhancements https://developer.ibm.com/storage/2018/01/05/ibm-spectrum-scale-mmfsck-savvy-enhancements/ ESS Disk Management https://developer.ibm.com/storage/2018/01/02/ess-disk-management/ IBM Spectrum Scale Object Protocol On Ubuntu https://developer.ibm.com/storage/2018/01/01/ibm-spectrum-scale-object-protocol-ubuntu/ IBM Spectrum Scale 5.0 ? Whats new in Unified File and Object https://developer.ibm.com/storage/2017/12/20/ibm-spectrum-scale-5-0-whats-new-object/ A Complete Guide to ? Protocol Problem Determination Guide for IBM Spectrum Scale? ? Part 1 https://developer.ibm.com/storage/2017/12/19/complete-guide-protocol-problem-determination-guide-ibm-spectrum-scale-1/ IBM Spectrum Scale installation toolkit ? enhancements over releases https://developer.ibm.com/storage/2017/12/15/ibm-spectrum-scale-installation-toolkit-enhancements-releases/ Network requirements in an Elastic Storage Server Setup https://developer.ibm.com/storage/2017/12/13/network-requirements-in-an-elastic-storage-server-setup/ Co-resident migration with Transparent cloud tierin https://developer.ibm.com/storage/2017/12/05/co-resident-migration-transparent-cloud-tierin/ IBM Spectrum Scale on Hortonworks HDP Hadoop clusters : A Complete Big Data Solution https://developer.ibm.com/storage/2017/12/05/ibm-spectrum-scale-hortonworks-hdp-hadoop-clusters-complete-big-data-solution/ Big data analytics with Spectrum Scale using remote cluster mount & multi-filesystem support https://developer.ibm.com/storage/2017/11/28/big-data-analytics-spectrum-scale-using-remote-cluster-mount-multi-filesystem-support/ IBM Spectrum Scale HDFS Transparency Short Circuit Write Support https://developer.ibm.com/storage/2017/11/28/ibm-spectrum-scale-hdfs-transparency-short-circuit-write-support/ IBM Spectrum Scale HDFS Transparency Federation Support https://developer.ibm.com/storage/2017/11/27/ibm-spectrum-scale-hdfs-transparency-federation-support/ How to configure and performance tuning different system workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-different-system-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ How to configure and performance tuning Spark workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-spark-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ How to configure and performance tuning database workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-database-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ How to configure and performance tuning Hadoop workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/24/configure-performance-tuning-hadoop-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ IBM Spectrum Scale Sharing Nothing Cluster Performance Tuning https://developer.ibm.com/storage/2017/11/24/ibm-spectrum-scale-sharing-nothing-cluster-performance-tuning/ How to Configure IBM Spectrum Scale? with NIS based Authentication. https://developer.ibm.com/storage/2017/11/21/configure-ibm-spectrum-scale-nis-based-authentication/ For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Cc: Doris Conti/Poughkeepsie/IBM at IBMUS Date: 11/16/2017 08:15 PM Subject: Latest Technical Blogs on Spectrum Scale Dear User Group members, Here are the Development Blogs in last 3 months on Spectrum Scale Technical Topics. Spectrum Scale Monitoring ? Know More ? https://developer.ibm.com/storage/2017/11/16/spectrum-scale-monitoring-know/ IBM Spectrum Scale 5.0 Release ? What?s coming ! https://developer.ibm.com/storage/2017/11/14/ibm-spectrum-scale-5-0-release-whats-coming/ Four Essentials things to know for managing data ACLs on IBM Spectrum Scale? from Windows https://developer.ibm.com/storage/2017/11/13/four-essentials-things-know-managing-data-acls-ibm-spectrum-scale-windows/ GSSUTILS: A new way of running SSR, Deploying or Upgrading ESS Server https://developer.ibm.com/storage/2017/11/13/gssutils/ IBM Spectrum Scale Object Authentication https://developer.ibm.com/storage/2017/11/02/spectrum-scale-object-authentication/ Video Surveillance ? Choosing the right storage https://developer.ibm.com/storage/2017/11/02/video-surveillance-choosing-right-storage/ IBM Spectrum scale object deep dive training with problem determination https://www.slideshare.net/SmitaRaut/ibm-spectrum-scale-object-deep-dive-training Spectrum Scale as preferred software defined storage for Ubuntu OpenStack https://developer.ibm.com/storage/2017/09/29/spectrum-scale-preferred-software-defined-storage-ubuntu-openstack/ IBM Elastic Storage Server 2U24 Storage ? an All-Flash offering, a performance workhorse https://developer.ibm.com/storage/2017/10/06/ess-5-2-flash-storage/ A Complete Guide to Configure LDAP-based authentication with IBM Spectrum Scale? for File Access https://developer.ibm.com/storage/2017/09/21/complete-guide-configure-ldap-based-authentication-ibm-spectrum-scale-file-access/ Deploying IBM Spectrum Scale on AWS Quick Start https://developer.ibm.com/storage/2017/09/18/deploy-ibm-spectrum-scale-on-aws-quick-start/ Monitoring Spectrum Scale Object metrics https://developer.ibm.com/storage/2017/09/14/monitoring-spectrum-scale-object-metrics/ Tier your data with ease to Spectrum Scale Private Cloud(s) using Moonwalk Universal https://developer.ibm.com/storage/2017/09/14/tier-data-ease-spectrum-scale-private-clouds-using-moonwalk-universal/ Why do I see owner as ?Nobody? for my export mounted using NFSV4 Protocol on IBM Spectrum Scale?? https://developer.ibm.com/storage/2017/09/08/see-owner-nobody-export-mounted-using-nfsv4-protocol-ibm-spectrum-scale/ IBM Spectrum Scale? Authentication using Active Directory and LDAP https://developer.ibm.com/storage/2017/09/01/ibm-spectrum-scale-authentication-using-active-directory-ldap/ IBM Spectrum Scale? Authentication using Active Directory and RFC2307 https://developer.ibm.com/storage/2017/09/01/ibm-spectrum-scale-authentication-using-active-directory-rfc2307/ High Availability Implementation with IBM Spectrum Virtualize and IBM Spectrum Scale https://developer.ibm.com/storage/2017/08/30/high-availability-implementation-ibm-spectrum-virtualize-ibm-spectrum-scale/ 10 Frequently asked Questions on configuring Authentication using AD + AUTO ID mapping on IBM Spectrum Scale?. https://developer.ibm.com/storage/2017/08/04/10-frequently-asked-questions-configuring-authentication-using-ad-auto-id-mapping-ibm-spectrum-scale/ IBM Spectrum Scale? Authentication using Active Directory https://developer.ibm.com/storage/2017/07/30/ibm-spectrum-scale-auth-using-active-directory/ Five cool things that you didn?t know Transparent Cloud Tiering on Spectrum Scale can do https://developer.ibm.com/storage/2017/07/29/five-cool-things-didnt-know-transparent-cloud-tiering-spectrum-scale-can/ IBM Spectrum Scale GUI videos https://developer.ibm.com/storage/2017/07/25/ibm-spectrum-scale-gui-videos/ IBM Spectrum Scale? Authentication ? Planning for NFS Access https://developer.ibm.com/storage/2017/07/24/ibm-spectrum-scale-planning-nfs-access/ For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media -------------- next part -------------- An HTML attachment was scrubbed... URL: From cabrillo at ifca.unican.es Tue Jan 15 10:49:58 2019 From: cabrillo at ifca.unican.es (Iban Cabrillo) Date: Tue, 15 Jan 2019 11:49:58 +0100 (CET) Subject: [gpfsug-discuss] Monitoring non gpfs cluster nodes using pmsensors Message-ID: <1730394866.8701339.1547549398355.JavaMail.zimbra@ifca.unican.es> Dear, The gpfsgui dashboard show us most part of relevant information for cluster management. Avoiding to install other plot utilities (like graphana for example), we want to explore the possibility to use this packages to harvest and plot this information, in order to centralize the graph management in one only place. We see this information arrives to the gpfsgui node (from non gpfs cluster nodes), but we can't show the plots. Is there any way to use the pmsensor and pmcollector packages to monitorice / plot non gpfs cluster nodes using the gpfsgui dashboard ? Regards, I -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon Jan 14 15:02:07 2019 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 14 Jan 2019 15:02:07 +0000 Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? In-Reply-To: References: Message-ID: <5DCBA252-33B7-4AC9-B2E3-10DA6881B1AA@vanderbilt.edu> Hi Scott and Valdis (and everyone else), Thanks for your responses. Yes, we _could_ easily build a local naming scheme ? the name of the fileset matches the name of a folder in one of a couple of parent directories. However, an earlier response to my post asked if we?d be willing to share our script with the community and we would ? _if_ we can make it generic enough to be useful. Local naming schemes hardcoded in the script make it much less generically useful. Plus, it just seems to me that there ought to be a way to do this ? to get a list of fileset names from mmlsquota and then programmatically determine their junction path without having root privileges. GPFS has got to be storing that information somewhere, and I?m frankly quite surprised that no IBMer has responded with an answer to that. But I also know that when IBM is silent, there?s typically a reason. And yes, we could regularly create a static file ? in fact, that?s what we do now once per day (in the early morning hours). While this is not a huge deal - we only create / delete filesets a handful of times per month - on the day we do the script won?t function properly unless we manually update the file. I?m wanting to eliminate that, if possible ? which as I stated in the preceding paragraph, I have a hard time believing is not possible. I did look at the list of callbacks again (good thought!) and there?s not one specifically related to the creation / deletion of a fileset. There was only one that I saw that I think could even possibly be of use ? ccrFileChange. Can anyone on the list confirm or deny that the creation / deletion of a fileset would cause that callback to be triggered?? If it is triggered, then we could use that to update the static filesets within a minute or two of the change being made, which would definitely be acceptable. I realize that many things likely trigger a ccrFileChange, so I?m thinking of having a callback script that checks the current list of filesets against the static file and updates that appropriately. Thanks again for the responses? Kevin > On Jan 13, 2019, at 10:09 PM, Scott Goldman wrote: > > Kevin, > Something I've done in the past is to create a service that once an hour/day/week that would build a static file that consists of the needed output. > > As long as you can take the update delay (or perhaps trigger the update with a callback), this should work and could actually be lighter on the system. > > Sent from my BlackBerry - the most secure mobile device > > Original Message > From: valdis.kletnieks at vt.edu > Sent: January 12, 2019 4:07 PM > To: gpfsug-discuss at spectrumscale.org > Reply-to: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? > > On Sat, 12 Jan 2019 03:07:29 +0000, "Buterbaugh, Kevin L" said: >> But from there I need to then be able to find out where that fileset is >> mounted in the directory tree so that I can see who the owner and group of that >> directory are. > > You're not able to leverage a local naming scheme? There's no connection between > the name of the fileset and where it is in the tree? I would hope there is, because > otherwise when your tool says 'Fileset ag5eg19 is over quota', your poor user will > now be confused over what director(y/ies) need to be cleaned up. If your tool > says 'Fileset foo_bar_baz is over quota' and the user knows that's mounted at > /gpfs/foo/bar/baz then it's actionable. > > And if the user knows what the mapping is, your script can know it too.... > From Kevin.Buterbaugh at Vanderbilt.Edu Mon Jan 14 15:02:07 2019 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 14 Jan 2019 15:02:07 +0000 Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? In-Reply-To: References: Message-ID: <5DCBA252-33B7-4AC9-B2E3-10DA6881B1AA@vanderbilt.edu> Hi Scott and Valdis (and everyone else), Thanks for your responses. Yes, we _could_ easily build a local naming scheme ? the name of the fileset matches the name of a folder in one of a couple of parent directories. However, an earlier response to my post asked if we?d be willing to share our script with the community and we would ? _if_ we can make it generic enough to be useful. Local naming schemes hardcoded in the script make it much less generically useful. Plus, it just seems to me that there ought to be a way to do this ? to get a list of fileset names from mmlsquota and then programmatically determine their junction path without having root privileges. GPFS has got to be storing that information somewhere, and I?m frankly quite surprised that no IBMer has responded with an answer to that. But I also know that when IBM is silent, there?s typically a reason. And yes, we could regularly create a static file ? in fact, that?s what we do now once per day (in the early morning hours). While this is not a huge deal - we only create / delete filesets a handful of times per month - on the day we do the script won?t function properly unless we manually update the file. I?m wanting to eliminate that, if possible ? which as I stated in the preceding paragraph, I have a hard time believing is not possible. I did look at the list of callbacks again (good thought!) and there?s not one specifically related to the creation / deletion of a fileset. There was only one that I saw that I think could even possibly be of use ? ccrFileChange. Can anyone on the list confirm or deny that the creation / deletion of a fileset would cause that callback to be triggered?? If it is triggered, then we could use that to update the static filesets within a minute or two of the change being made, which would definitely be acceptable. I realize that many things likely trigger a ccrFileChange, so I?m thinking of having a callback script that checks the current list of filesets against the static file and updates that appropriately. Thanks again for the responses? Kevin > On Jan 13, 2019, at 10:09 PM, Scott Goldman wrote: > > Kevin, > Something I've done in the past is to create a service that once an hour/day/week that would build a static file that consists of the needed output. > > As long as you can take the update delay (or perhaps trigger the update with a callback), this should work and could actually be lighter on the system. > > Sent from my BlackBerry - the most secure mobile device > > Original Message > From: valdis.kletnieks at vt.edu > Sent: January 12, 2019 4:07 PM > To: gpfsug-discuss at spectrumscale.org > Reply-to: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? > > On Sat, 12 Jan 2019 03:07:29 +0000, "Buterbaugh, Kevin L" said: >> But from there I need to then be able to find out where that fileset is >> mounted in the directory tree so that I can see who the owner and group of that >> directory are. > > You're not able to leverage a local naming scheme? There's no connection between > the name of the fileset and where it is in the tree? I would hope there is, because > otherwise when your tool says 'Fileset ag5eg19 is over quota', your poor user will > now be confused over what director(y/ies) need to be cleaned up. If your tool > says 'Fileset foo_bar_baz is over quota' and the user knows that's mounted at > /gpfs/foo/bar/baz then it's actionable. > > And if the user knows what the mapping is, your script can know it too.... > From makaplan at us.ibm.com Tue Jan 15 14:46:18 2019 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 15 Jan 2019 11:46:18 -0300 Subject: [gpfsug-discuss] Get list of filesets_without_runningmmlsfileset? In-Reply-To: <5DCBA252-33B7-4AC9-B2E3-10DA6881B1AA@vanderbilt.edu> References: <5DCBA252-33B7-4AC9-B2E3-10DA6881B1AA@vanderbilt.edu> Message-ID: Personally, I agree that there ought to be a way in the product. In the meawhile, you no doubt already have some ways to tell your users where to find their filesets as pathnames. Otherwise, how are they accessing their files? And to keep things somewhat sane, I'd bet filesets are all linked to one or small number of well known paths in the filesystem. Like /AGpfsFilesystem/filesets/... Plus you could add symlinks and/or as has been suggested post info extracted from mmlsfileset and/or mmlsquota. So as a practical matter, is this an urgent problem...? Why? How? -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Tue Jan 15 15:11:41 2019 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Tue, 15 Jan 2019 15:11:41 +0000 Subject: [gpfsug-discuss] Get list of filesets_without_runningmmlsfileset? In-Reply-To: References: <5DCBA252-33B7-4AC9-B2E3-10DA6881B1AA@vanderbilt.edu> Message-ID: <0D5558D9-9003-4B95-9A37-42321E03114D@vanderbilt.edu> Hi Marc (All), Yes, I can easily determine where filesets are linked here ? it is, as you said, in just one or two paths. The script as it stands now has been doing that for several years and only needs a couple of relatively minor tweaks to be even more useful to _us_ by whittling down a couple of edge cases relating to fileset creation / deletion. However ? there was a request to share the script with the broader community ? something I?m willing to do if I can get it in a state where it would be useful to others with little or no modification. Anybody who?s been on this list for any length of time knows how much help I?ve received from the community over the years. I truly appreciate that and would like to give back, even in a minor way, if possible. But in order to do that the script can?t be full of local assumptions ? that?s it in a nutshell ? that?s why I want to programmatically determine the junction path at run time as a non-root user. I?ll also mention here that early on in this thread Simon Thompson suggested looking into the REST API. Sure enough, you can get the information that way ? but, AFAICT, that would require the script to contain a username / password combination that would allow anyone with access to the script to then use that authentication information to access other information within GPFS that we probably don?t want them to have access to. If I?m mistaken about that, then please feel free to enlighten me. Thanks again? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Jan 15, 2019, at 8:46 AM, Marc A Kaplan > wrote: Personally, I agree that there ought to be a way in the product. In the meawhile, you no doubt already have some ways to tell your users where to find their filesets as pathnames. Otherwise, how are they accessing their files? And to keep things somewhat sane, I'd bet filesets are all linked to one or small number of well known paths in the filesystem. Like /AGpfsFilesystem/filesets/... Plus you could add symlinks and/or as has been suggested post info extracted from mmlsfileset and/or mmlsquota. So as a practical matter, is this an urgent problem...? Why? How? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cbd2c28fdb60041f3434e08d67af83b11%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636831603904557717&sdata=A74TTq%2FQvyhEMHaolklbiMAEnaGVuHNiyhVYfn4wRek%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rohwedder at de.ibm.com Tue Jan 15 15:36:39 2019 From: rohwedder at de.ibm.com (Markus Rohwedder) Date: Tue, 15 Jan 2019 16:36:39 +0100 Subject: [gpfsug-discuss] Monitoring non gpfs cluster nodes using pmsensors In-Reply-To: <1730394866.8701339.1547549398355.JavaMail.zimbra@ifca.unican.es> References: <1730394866.8701339.1547549398355.JavaMail.zimbra@ifca.unican.es> Message-ID: Hello Iban, the pmsensor and pmcollector packages together with the GUI dashboard and statistics pages are not designed to be a general monitoring solution. For example. in many places we are filtering for GPFS nodes that are known to be cluster members and we try to match host names to GPFS node names. This causes the lack of nodes in GUI charts you are experiencing. In addition. the CLI based setup and management of the sensors assume that sensor nodes are cluster nodes. We are not intending to open up the internal management and views for data outside the cluster in the futute.- The requirements to provide plotting, filtering, aggregation and calculation in a general plotting environment can be very diverse and we may not be able to handle this. So while we are flattered by the request to use our charting capabilities as a general solution, we propose to use tools like grafana as more general solution. Please note that the GUI charts and dashboards have URLs that allow them to be hyperlinked, so you could also combine other web based charting tools together with the GUI based charts. Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development Phone: +49 7034 6430190 IBM Deutschland Research & Development E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany From: Iban Cabrillo To: gpfsug-discuss Date: 15.01.2019 12:05 Subject: [gpfsug-discuss] Monitoring non gpfs cluster nodes using pmsensors Sent by: gpfsug-discuss-bounces at spectrumscale.org Dear, The gpfsgui dashboard show us most part of relevant information for cluster management. Avoiding to install other plot utilities (like graphana for example), we want to explore the possibility to use this packages to harvest and plot this information, in order to centralize the graph management in one only place. We see this information arrives to the gpfsgui node (from non gpfs cluster nodes), but we can't show the plots. Is there any way to use the pmsensor and pmcollector packages to monitorice / plot non gpfs cluster nodes using the gpfsgui dashboard ? Regards, I _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1D690169.gif Type: image/gif Size: 4659 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Tue Jan 15 15:57:39 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 15 Jan 2019 15:57:39 +0000 Subject: [gpfsug-discuss] Monitoring non gpfs cluster nodes using pmsensors Message-ID: Understand that you don?t want to install Grafana on its own, but there is a GPFS Grafana bridge I believe that would allow you to include the GPFS collected data in a Grafana dashboard. So if not wanting to setup sensors for that data is the reason you don?t want Grafana, then using the bridge might pull the data you want? Simon From: on behalf of "cabrillo at ifca.unican.es" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Tuesday, 15 January 2019 at 11:05 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Monitoring non gpfs cluster nodes using pmsensors Dear, The gpfsgui dashboard show us most part of relevant information for cluster management. Avoiding to install other plot utilities (like graphana for example), we want to explore the possibility to use this packages to harvest and plot this information, in order to centralize the graph management in one only place. We see this information arrives to the gpfsgui node (from non gpfs cluster nodes), but we can't show the plots. Is there any way to use the pmsensor and pmcollector packages to monitorice / plot non gpfs cluster nodes using the gpfsgui dashboard ? Regards, I -------------- next part -------------- An HTML attachment was scrubbed... URL: From A.Wolf-Reber at de.ibm.com Wed Jan 16 08:16:58 2019 From: A.Wolf-Reber at de.ibm.com (Alexander Wolf) Date: Wed, 16 Jan 2019 08:16:58 +0000 Subject: [gpfsug-discuss] Get list offilesets_without_runningmmlsfileset? In-Reply-To: References: , <5DCBA252-33B7-4AC9-B2E3-10DA6881B1AA@vanderbilt.edu> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15475476039319.png Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.154754760393110.png Type: image/png Size: 6645 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.154754760393111.png Type: image/png Size: 1134 bytes Desc: not available URL: From makaplan at us.ibm.com Wed Jan 16 12:57:18 2019 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Wed, 16 Jan 2019 09:57:18 -0300 Subject: [gpfsug-discuss] Get fileset and other info via Rest API and/or GUI In-Reply-To: References: , <5DCBA252-33B7-4AC9-B2E3-10DA6881B1AA@vanderbilt.edu> Message-ID: Good to know the "Rest" does it for us. Since I started working on GPFS internals and CLI utitlities around Release 3.x, I confess I never had need of the GUI or the Rest API server. In fact I do most of my work remotely via Putty/Xterm/Emacs and only once-in-a-while even have an XWindows or VNC server/view of a GPFS node! So consider any of my remarks in that context. So I certainly defer to others when it comes to Spectrum Scale GUIs, "Protocol" servers and such. If I'm missing anything great, perhaps some kind soul will send me a note offline from this public forum. --Marc.K of GPFS -------------- next part -------------- An HTML attachment was scrubbed... URL: From spectrumscale at kiranghag.com Wed Jan 16 16:18:16 2019 From: spectrumscale at kiranghag.com (KG) Date: Wed, 16 Jan 2019 21:48:16 +0530 Subject: [gpfsug-discuss] Filesystem automount issues Message-ID: Hi IHAC running Scale 5.x on RHEL 7.5 One out of two filesystems (/home) does not get mounted automatically at boot. (/home is scale filesystem) The scale log does mention that the filesystem is mounted but mount output says otherwise. There are no entries for /home in fstab since we let scale mount it. Automount on scale and filesystem both have been set to yes. Any pointers to troubleshoot would be appreciated. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Wed Jan 16 16:33:25 2019 From: stockf at us.ibm.com (Frederick Stock) Date: Wed, 16 Jan 2019 11:33:25 -0500 Subject: [gpfsug-discuss] Filesystem automount issues In-Reply-To: References: Message-ID: What does the output of "mmlsmount all -L" show? Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: KG To: gpfsug main discussion list Date: 01/16/2019 11:19 AM Subject: [gpfsug-discuss] Filesystem automount issues Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi IHAC running Scale 5.x on RHEL 7.5 One out of two filesystems (/home) does not get mounted automatically at boot. (/home is scale filesystem) The scale log does mention that the filesystem is mounted but mount output says otherwise. There are no entries for /home in fstab since we let scale mount it. Automount on scale and filesystem both have been set to yes. Any pointers to troubleshoot would be appreciated. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From spectrumscale at kiranghag.com Wed Jan 16 18:14:39 2019 From: spectrumscale at kiranghag.com (KG) Date: Wed, 16 Jan 2019 23:44:39 +0530 Subject: [gpfsug-discuss] Filesystem automount issues In-Reply-To: References: Message-ID: It shows that the filesystem is not mounted On Wed, Jan 16, 2019, 22:03 Frederick Stock What does the output of "mmlsmount all -L" show? > > Fred > __________________________________________________ > Fred Stock | IBM Pittsburgh Lab | 720-430-8821 > stockf at us.ibm.com > > > > From: KG > To: gpfsug main discussion list > Date: 01/16/2019 11:19 AM > Subject: [gpfsug-discuss] Filesystem automount issues > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi > > IHAC running Scale 5.x on RHEL 7.5 > > One out of two filesystems (/home) does not get mounted automatically at > boot. (/home is scale filesystem) > > The scale log does mention that the filesystem is mounted but mount output > says otherwise. > > There are no entries for /home in fstab since we let scale mount it. > Automount on scale and filesystem both have been set to yes. > > Any pointers to troubleshoot would be appreciated. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Wed Jan 16 18:38:07 2019 From: stockf at us.ibm.com (Frederick Stock) Date: Wed, 16 Jan 2019 13:38:07 -0500 Subject: [gpfsug-discuss] Filesystem automount issues In-Reply-To: References: Message-ID: Would it be possible for you to include the output of "mmlsmount all -L" and "df -k" in your response? Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: KG To: gpfsug main discussion list Date: 01/16/2019 01:15 PM Subject: Re: [gpfsug-discuss] Filesystem automount issues Sent by: gpfsug-discuss-bounces at spectrumscale.org It shows that the filesystem is not mounted On Wed, Jan 16, 2019, 22:03 Frederick Stock To: gpfsug main discussion list Date: 01/16/2019 11:19 AM Subject: [gpfsug-discuss] Filesystem automount issues Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi IHAC running Scale 5.x on RHEL 7.5 One out of two filesystems (/home) does not get mounted automatically at boot. (/home is scale filesystem) The scale log does mention that the filesystem is mounted but mount output says otherwise. There are no entries for /home in fstab since we let scale mount it. Automount on scale and filesystem both have been set to yes. Any pointers to troubleshoot would be appreciated. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Wed Jan 16 20:01:53 2019 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Wed, 16 Jan 2019 21:01:53 +0100 Subject: [gpfsug-discuss] Filesystem automount issues In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Jan 17 11:35:13 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 17 Jan 2019 11:35:13 +0000 Subject: [gpfsug-discuss] Node expels Message-ID: We?ve recently been seeing quite a few node expels with messages of the form: 2019-01-17_11:19:30.882+0000: [W] The TCP connection to IP address 10.20.0.58 proto-pg-pf01.bear.cluster (socket 153) state is unexpected: state=1 ca_state=4 snd_cwnd=1 snd_ssthresh=5 unacked=5 probes=0 backoff=7 retransmits=7 rto=26496000 rcv_ssthresh=102828 rtt=6729 rttvar=12066 sacked=0 retrans=1 reordering=3 lost=5 2019-01-17_11:19:30.882+0000: [I] tscCheckTcpConn: Sending debug data collection request to node 10.20.0.58 proto-pg-pf01.bear.cluster 2019-01-17_11:19:30.882+0000: Sending request to collect TCP debug data to proto-pg-pf01.bear.cluster localNode 2019-01-17_11:19:30.882+0000: [I] Calling user exit script gpfsSendRequestToNodes: event sendRequestToNodes, Async command /usr/lpp/mmfs/bin/mmcommon. 2019-01-17_11:24:52.611+0000: [E] Timed out in 300 seconds waiting for a commMsgCheckMessages reply from node 10.20.0.58 proto-pg-pf01.bear.cluster. Sending expel message. On the client node, we see messages of the form: 2019-01-17_11:19:31.101+0000: [N] sdrServ: Received Tcp data collection request from 10.10.0.33 2019-01-17_11:19:31.102+0000: [N] GPFS will attempt to collect Tcp debug data on this node. 2019-01-17_11:24:52.838+0000: [N] sdrServ: Received expel data collection request from 10.10.0.33 2019-01-17_11:24:52.838+0000: [N] GPFS will attempt to collect debug data on this node. 2019-01-17_11:25:02.741+0000: [N] This node will be expelled from cluster rds.gpfs.servers due to expel msg from 10.10.12.41 (b ber-les-nsd01-data.bb2.cluster in rds.gpfs.server 2019-01-17_11:25:03.160+0000: [N] sdrServ: Received expel data collection request from 10.20.0.56 They always appear to be to a specific type of hardware with the same Ethernet controller, though the nodes are split across three data centres and we aren?t seeing link congestion on the links between them. On the node I listed above, it?s not actually doing anything either as the software on it is still being installed (i.e. it?s not doing GPFS or any other IO other than a couple of home directories). Any suggestions on what ?(socket 153) state is unexpected? means? Thanks Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From TOMP at il.ibm.com Thu Jan 17 11:46:19 2019 From: TOMP at il.ibm.com (Tomer Perry) Date: Thu, 17 Jan 2019 13:46:19 +0200 Subject: [gpfsug-discuss] Node expels In-Reply-To: References: Message-ID: Simon, Take a look at http://files.gpfsug.org/presentations/2018/USA/Scale_Network_Flow-0.8.pdf slide 13. Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 17/01/2019 13:35 Subject: [gpfsug-discuss] Node expels Sent by: gpfsug-discuss-bounces at spectrumscale.org We?ve recently been seeing quite a few node expels with messages of the form: 2019-01-17_11:19:30.882+0000: [W] The TCP connection to IP address 10.20.0.58 proto-pg-pf01.bear.cluster (socket 153) state is unexpected: state=1 ca_state=4 snd_cwnd=1 snd_ssthresh=5 unacked=5 probes=0 backoff=7 retransmits=7 rto=26496000 rcv_ssthresh=102828 rtt=6729 rttvar=12066 sacked=0 retrans=1 reordering=3 lost=5 2019-01-17_11:19:30.882+0000: [I] tscCheckTcpConn: Sending debug data collection request to node 10.20.0.58 proto-pg-pf01.bear.cluster 2019-01-17_11:19:30.882+0000: Sending request to collect TCP debug data to proto-pg-pf01.bear.cluster localNode 2019-01-17_11:19:30.882+0000: [I] Calling user exit script gpfsSendRequestToNodes: event sendRequestToNodes, Async command /usr/lpp/mmfs/bin/mmcommon. 2019-01-17_11:24:52.611+0000: [E] Timed out in 300 seconds waiting for a commMsgCheckMessages reply from node 10.20.0.58 proto-pg-pf01.bear.cluster. Sending expel message. On the client node, we see messages of the form: 2019-01-17_11:19:31.101+0000: [N] sdrServ: Received Tcp data collection request from 10.10.0.33 2019-01-17_11:19:31.102+0000: [N] GPFS will attempt to collect Tcp debug data on this node. 2019-01-17_11:24:52.838+0000: [N] sdrServ: Received expel data collection request from 10.10.0.33 2019-01-17_11:24:52.838+0000: [N] GPFS will attempt to collect debug data on this node. 2019-01-17_11:25:02.741+0000: [N] This node will be expelled from cluster rds.gpfs.servers due to expel msg from 10.10.12.41 (b ber-les-nsd01-data.bb2.cluster in rds.gpfs.server 2019-01-17_11:25:03.160+0000: [N] sdrServ: Received expel data collection request from 10.20.0.56 They always appear to be to a specific type of hardware with the same Ethernet controller, though the nodes are split across three data centres and we aren?t seeing link congestion on the links between them. On the node I listed above, it?s not actually doing anything either as the software on it is still being installed (i.e. it?s not doing GPFS or any other IO other than a couple of home directories). Any suggestions on what ?(socket 153) state is unexpected? means? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From TOMP at il.ibm.com Thu Jan 17 13:28:15 2019 From: TOMP at il.ibm.com (Tomer Perry) Date: Thu, 17 Jan 2019 15:28:15 +0200 Subject: [gpfsug-discuss] Node expels In-Reply-To: References: Message-ID: Hi, I was asked to elaborate a bit ( thus also adding John and Yong Ze Chen). As written on the slide: One of the best ways to determine if a network layer problem is root cause for an expel is to look at the low-level socket details dumped in the ?extra? log data (mmfs dump all) saved as part of automatic data collection on Linux GPFS nodes. So, the idea is that in expel situation, we dump the socket state from the OS ( you can see the same using 'ss -i' for example). In your example, it shows that the ca_state is 4, there are retransmits, high rto and all the point to a network problem. You can find more details here: http://www.yonch.com/tech/linux-tcp-congestion-control-internals Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: "Tomer Perry" To: gpfsug main discussion list Date: 17/01/2019 13:46 Subject: Re: [gpfsug-discuss] Node expels Sent by: gpfsug-discuss-bounces at spectrumscale.org Simon, Take a look at http://files.gpfsug.org/presentations/2018/USA/Scale_Network_Flow-0.8.pdf slide 13. Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 17/01/2019 13:35 Subject: [gpfsug-discuss] Node expels Sent by: gpfsug-discuss-bounces at spectrumscale.org We?ve recently been seeing quite a few node expels with messages of the form: 2019-01-17_11:19:30.882+0000: [W] The TCP connection to IP address 10.20.0.58 proto-pg-pf01.bear.cluster (socket 153) state is unexpected: state=1 ca_state=4 snd_cwnd=1 snd_ssthresh=5 unacked=5 probes=0 backoff=7 retransmits=7 rto=26496000 rcv_ssthresh=102828 rtt=6729 rttvar=12066 sacked=0 retrans=1 reordering=3 lost=5 2019-01-17_11:19:30.882+0000: [I] tscCheckTcpConn: Sending debug data collection request to node 10.20.0.58 proto-pg-pf01.bear.cluster 2019-01-17_11:19:30.882+0000: Sending request to collect TCP debug data to proto-pg-pf01.bear.cluster localNode 2019-01-17_11:19:30.882+0000: [I] Calling user exit script gpfsSendRequestToNodes: event sendRequestToNodes, Async command /usr/lpp/mmfs/bin/mmcommon. 2019-01-17_11:24:52.611+0000: [E] Timed out in 300 seconds waiting for a commMsgCheckMessages reply from node 10.20.0.58 proto-pg-pf01.bear.cluster. Sending expel message. On the client node, we see messages of the form: 2019-01-17_11:19:31.101+0000: [N] sdrServ: Received Tcp data collection request from 10.10.0.33 2019-01-17_11:19:31.102+0000: [N] GPFS will attempt to collect Tcp debug data on this node. 2019-01-17_11:24:52.838+0000: [N] sdrServ: Received expel data collection request from 10.10.0.33 2019-01-17_11:24:52.838+0000: [N] GPFS will attempt to collect debug data on this node. 2019-01-17_11:25:02.741+0000: [N] This node will be expelled from cluster rds.gpfs.servers due to expel msg from 10.10.12.41 (b ber-les-nsd01-data.bb2.cluster in rds.gpfs.server 2019-01-17_11:25:03.160+0000: [N] sdrServ: Received expel data collection request from 10.20.0.56 They always appear to be to a specific type of hardware with the same Ethernet controller, though the nodes are split across three data centres and we aren?t seeing link congestion on the links between them. On the node I listed above, it?s not actually doing anything either as the software on it is still being installed (i.e. it?s not doing GPFS or any other IO other than a couple of home directories). Any suggestions on what ?(socket 153) state is unexpected? means? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jlewars at us.ibm.com Thu Jan 17 14:30:45 2019 From: jlewars at us.ibm.com (John Lewars) Date: Thu, 17 Jan 2019 09:30:45 -0500 Subject: [gpfsug-discuss] Node expels In-Reply-To: References: Message-ID: >They always appear to be to a specific type of hardware with the same Ethernet controller, That makes me think you might be seeing packet loss that could require ring buffer tuning (the defaults and limits will differ with different ethernet adapters). The expel section in the slides on this page has been expanded to include a 'debugging expels section' (slides 19-20, which also reference ring buffer tuning): https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/DEBUG%20Expels/comment/7e4f9433-7ca3-430f-b40b-94777c507381 Regards, John Lewars Spectrum Scale Performance, IBM Poughkeepsie From: Tomer Perry/Israel/IBM To: gpfsug main discussion list Cc: John Lewars/Poughkeepsie/IBM at IBMUS, Yong Ze Chen/China/IBM at IBMCN Date: 01/17/2019 08:28 AM Subject: Re: [gpfsug-discuss] Node expels Hi, I was asked to elaborate a bit ( thus also adding John and Yong Ze Chen). As written on the slide: One of the best ways to determine if a network layer problem is root cause for an expel is to look at the low-level socket details dumped in the ?extra? log data (mmfs dump all) saved as part of automatic data collection on Linux GPFS nodes. So, the idea is that in expel situation, we dump the socket state from the OS ( you can see the same using 'ss -i' for example). In your example, it shows that the ca_state is 4, there are retransmits, high rto and all the point to a network problem. You can find more details here: http://www.yonch.com/tech/linux-tcp-congestion-control-internals Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: "Tomer Perry" To: gpfsug main discussion list Date: 17/01/2019 13:46 Subject: Re: [gpfsug-discuss] Node expels Sent by: gpfsug-discuss-bounces at spectrumscale.org Simon, Take a look at http://files.gpfsug.org/presentations/2018/USA/Scale_Network_Flow-0.8.pdf slide 13. Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 17/01/2019 13:35 Subject: [gpfsug-discuss] Node expels Sent by: gpfsug-discuss-bounces at spectrumscale.org We?ve recently been seeing quite a few node expels with messages of the form: 2019-01-17_11:19:30.882+0000: [W] The TCP connection to IP address 10.20.0.58 proto-pg-pf01.bear.cluster (socket 153) state is unexpected: state=1 ca_state=4 snd_cwnd=1 snd_ssthresh=5 unacked=5 probes=0 backoff=7 retransmits=7 rto=26496000 rcv_ssthresh=102828 rtt=6729 rttvar=12066 sacked=0 retrans=1 reordering=3 lost=5 2019-01-17_11:19:30.882+0000: [I] tscCheckTcpConn: Sending debug data collection request to node 10.20.0.58 proto-pg-pf01.bear.cluster 2019-01-17_11:19:30.882+0000: Sending request to collect TCP debug data to proto-pg-pf01.bear.cluster localNode 2019-01-17_11:19:30.882+0000: [I] Calling user exit script gpfsSendRequestToNodes: event sendRequestToNodes, Async command /usr/lpp/mmfs/bin/mmcommon. 2019-01-17_11:24:52.611+0000: [E] Timed out in 300 seconds waiting for a commMsgCheckMessages reply from node 10.20.0.58 proto-pg-pf01.bear.cluster. Sending expel message. On the client node, we see messages of the form: 2019-01-17_11:19:31.101+0000: [N] sdrServ: Received Tcp data collection request from 10.10.0.33 2019-01-17_11:19:31.102+0000: [N] GPFS will attempt to collect Tcp debug data on this node. 2019-01-17_11:24:52.838+0000: [N] sdrServ: Received expel data collection request from 10.10.0.33 2019-01-17_11:24:52.838+0000: [N] GPFS will attempt to collect debug data on this node. 2019-01-17_11:25:02.741+0000: [N] This node will be expelled from cluster rds.gpfs.servers due to expel msg from 10.10.12.41 (b ber-les-nsd01-data.bb2.cluster in rds.gpfs.server 2019-01-17_11:25:03.160+0000: [N] sdrServ: Received expel data collection request from 10.20.0.56 They always appear to be to a specific type of hardware with the same Ethernet controller, though the nodes are split across three data centres and we aren?t seeing link congestion on the links between them. On the node I listed above, it?s not actually doing anything either as the software on it is still being installed (i.e. it?s not doing GPFS or any other IO other than a couple of home directories). Any suggestions on what ?(socket 153) state is unexpected? means? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Jan 17 19:02:06 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 17 Jan 2019 19:02:06 +0000 Subject: [gpfsug-discuss] Node expels In-Reply-To: References: , Message-ID: So we've backed out a bunch of network tuning parameters we had set (based on the GPFS wiki pages), they've been set a while but um ... maybe they are causing issues. Secondly, we've noticed in dump tscomm that we see connection broken to a node, and then the node ID is usually the same node, which is a bit weird to me. We've also just updated firmware on the Intel nics (the x722) which is part of the Skylake board. And specifically its the newer skylake kit we see this problem on. We've a number of issues with the x722 firmware (like it won't even bring a link up when plugged into some of our 10GbE switches, but that's another story). We've also dropped the bonded links from these nodes, just in case its related... Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of jlewars at us.ibm.com [jlewars at us.ibm.com] Sent: 17 January 2019 14:30 To: Tomer Perry; gpfsug main discussion list Cc: Yong Ze Chen Subject: Re: [gpfsug-discuss] Node expels >They always appear to be to a specific type of hardware with the same Ethernet controller, That makes me think you might be seeing packet loss that could require ring buffer tuning (the defaults and limits will differ with different ethernet adapters). The expel section in the slides on this page has been expanded to include a 'debugging expels section' (slides 19-20, which also reference ring buffer tuning): https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/DEBUG%20Expels/comment/7e4f9433-7ca3-430f-b40b-94777c507381 Regards, John Lewars Spectrum Scale Performance, IBM Poughkeepsie From: Tomer Perry/Israel/IBM To: gpfsug main discussion list Cc: John Lewars/Poughkeepsie/IBM at IBMUS, Yong Ze Chen/China/IBM at IBMCN Date: 01/17/2019 08:28 AM Subject: Re: [gpfsug-discuss] Node expels ________________________________ Hi, I was asked to elaborate a bit ( thus also adding John and Yong Ze Chen). As written on the slide: One of the best ways to determine if a network layer problem is root cause for an expel is to look at the low-level socket details dumped in the ?extra? log data (mmfs dump all) saved as part of automatic data collection on Linux GPFS nodes. So, the idea is that in expel situation, we dump the socket state from the OS ( you can see the same using 'ss -i' for example). In your example, it shows that the ca_state is 4, there are retransmits, high rto and all the point to a network problem. You can find more details here: http://www.yonch.com/tech/linux-tcp-congestion-control-internals Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: "Tomer Perry" To: gpfsug main discussion list Date: 17/01/2019 13:46 Subject: Re: [gpfsug-discuss] Node expels Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Simon, Take a look at http://files.gpfsug.org/presentations/2018/USA/Scale_Network_Flow-0.8.pdfslide 13. Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 17/01/2019 13:35 Subject: [gpfsug-discuss] Node expels Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ We?ve recently been seeing quite a few node expels with messages of the form: 2019-01-17_11:19:30.882+0000: [W] The TCP connection to IP address 10.20.0.58 proto-pg-pf01.bear.cluster (socket 153) state is unexpected: state=1 ca_state=4 snd_cwnd=1 snd_ssthresh=5 unacked=5 probes=0 backoff=7 retransmits=7 rto=26496000 rcv_ssthresh=102828 rtt=6729 rttvar=12066 sacked=0 retrans=1 reordering=3 lost=5 2019-01-17_11:19:30.882+0000: [I] tscCheckTcpConn: Sending debug data collection request to node 10.20.0.58 proto-pg-pf01.bear.cluster 2019-01-17_11:19:30.882+0000: Sending request to collect TCP debug data to proto-pg-pf01.bear.cluster localNode 2019-01-17_11:19:30.882+0000: [I] Calling user exit script gpfsSendRequestToNodes: event sendRequestToNodes, Async command /usr/lpp/mmfs/bin/mmcommon. 2019-01-17_11:24:52.611+0000: [E] Timed out in 300 seconds waiting for a commMsgCheckMessages reply from node 10.20.0.58 proto-pg-pf01.bear.cluster. Sending expel message. On the client node, we see messages of the form: 2019-01-17_11:19:31.101+0000: [N] sdrServ: Received Tcp data collection request from 10.10.0.33 2019-01-17_11:19:31.102+0000: [N] GPFS will attempt to collect Tcp debug data on this node. 2019-01-17_11:24:52.838+0000: [N] sdrServ: Received expel data collection request from 10.10.0.33 2019-01-17_11:24:52.838+0000: [N] GPFS will attempt to collect debug data on this node. 2019-01-17_11:25:02.741+0000: [N] This node will be expelled from cluster rds.gpfs.servers due to expel msg from 10.10.12.41 (b ber-les-nsd01-data.bb2.cluster in rds.gpfs.server 2019-01-17_11:25:03.160+0000: [N] sdrServ: Received expel data collection request from 10.20.0.56 They always appear to be to a specific type of hardware with the same Ethernet controller, though the nodes are split across three data centres and we aren?t seeing link congestion on the links between them. On the node I listed above, it?s not actually doing anything either as the software on it is still being installed (i.e. it?s not doing GPFS or any other IO other than a couple of home directories). Any suggestions on what ?(socket 153) state is unexpected? means? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From orichards at pixitmedia.com Thu Jan 17 20:52:50 2019 From: orichards at pixitmedia.com (Orlando Richards) Date: Thu, 17 Jan 2019 20:52:50 +0000 Subject: [gpfsug-discuss] Node expels In-Reply-To: References: Message-ID: <4e0ea3c4-3076-e9a0-55c3-58f98be96d9b@pixitmedia.com> Hi Simon, We've had to disable the offload's for Intel cards in many situations with the i40e drivers - Redhat have an article about it: https://access.redhat.com/solutions/3662011 ------- Orlando On 17/01/2019 19:02, Simon Thompson wrote: > So we've backed out a bunch of network tuning parameters we had set > (based on the GPFS wiki pages), they've been set a while but um ... > maybe they are causing issues. > > Secondly, we've noticed in dump tscomm that we see connection broken > to a node, and then the node ID is usually the same node, which is a > bit weird to me. > > We've also just updated firmware on the Intel nics (the x722) which is > part of the Skylake board. And specifically its the newer skylake kit > we see this problem on. We've a number of issues with the x722 > firmware (like it won't even bring a link up when plugged into some of > our 10GbE switches, but that's another story). > > We've also dropped the bonded links from these nodes, just in case its > related... > > Simon > > ------------------------------------------------------------------------ > *From:* gpfsug-discuss-bounces at spectrumscale.org > [gpfsug-discuss-bounces at spectrumscale.org] on behalf of > jlewars at us.ibm.com [jlewars at us.ibm.com] > *Sent:* 17 January 2019 14:30 > *To:* Tomer Perry; gpfsug main discussion list > *Cc:* Yong Ze Chen > *Subject:* Re: [gpfsug-discuss] Node expels > > >They always appear to be to a specific type of hardware with the same > Ethernet controller, > > That makes me think you might be seeing packet loss that could require > ring buffer tuning (the defaults and limits will differ with different > ethernet adapters). > > The expel section in the slides on this page has been expanded to > include a 'debugging expels section' (slides 19-20, which also > reference ring buffer tuning): > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/DEBUG%20Expels/comment/7e4f9433-7ca3-430f-b40b-94777c507381 > > Regards, > John Lewars > Spectrum Scale Performance, IBM Poughkeepsie > > > > > From: Tomer Perry/Israel/IBM > To: gpfsug main discussion list > Cc: John Lewars/Poughkeepsie/IBM at IBMUS, Yong Ze Chen/China/IBM at IBMCN > Date: 01/17/2019 08:28 AM > Subject: Re: [gpfsug-discuss] Node expels > ------------------------------------------------------------------------ > > > Hi, > > I was asked to elaborate a bit ( thus also adding John and Yong Ze Chen). > > As written on the slide: > One of the best ways to determine if a network layer problem is root > cause for an expel is to look at the low-level socket details dumped > in the ?extra? log data (mmfs dump all) saved as part of automatic > data collection on Linux GPFS nodes. > > So, the idea is that in expel situation, we dump the socket state from > the OS ( you can see the same using 'ss -i' for example). > In your example, it shows that the ca_state is 4, there are > retransmits, high rto and all the point to a network problem. > You can find more details here: > http://www.yonch.com/tech/linux-tcp-congestion-control-internals > > > Regards, > > Tomer Perry > Scalable I/O Development (Spectrum Scale) > email: tomp at il.ibm.com > 1 Azrieli Center, Tel Aviv 67021, Israel > Global Tel: ? ?+1 720 3422758 > Israel Tel: ? ? ?+972 3 9188625 > Mobile: ? ? ? ? +972 52 2554625 > > > > > > From: "Tomer Perry" > To: gpfsug main discussion list > Date: 17/01/2019 13:46 > Subject: Re: [gpfsug-discuss] Node expels > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------------------------------------------------ > > > > Simon, > > Take a look at > _http://files.gpfsug.org/presentations/2018/USA/Scale_Network_Flow-0.8.pdf_slide > 13. > > > Regards, > > Tomer Perry > Scalable I/O Development (Spectrum Scale) > email: tomp at il.ibm.com > 1 Azrieli Center, Tel Aviv 67021, Israel > Global Tel: ? ?+1 720 3422758 > Israel Tel: ? ? ?+972 3 9188625 > Mobile: ? ? ? ? +972 52 2554625 > > > > > From: Simon Thompson > To: "gpfsug-discuss at spectrumscale.org" > Date: 17/01/2019 13:35 > Subject: [gpfsug-discuss] Node expels > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------------------------------------------------ > > > > We?ve recently been seeing quite a few node expels with messages of > the form: > > 2019-01-17_11:19:30.882+0000: [W] The TCP connection to IP address > 10.20.0.58 proto-pg-pf01.bear.cluster (socket 153) state is > unexpected: state=1 ca_state=4 snd_cwnd=1 snd_ssthresh=5 unacked=5 > probes=0 backoff=7 retransmits=7 rto=26496000 rcv_ssthresh=102828 > rtt=6729 rttvar=12066 sacked=0 retrans=1 reordering=3 lost=5 > 2019-01-17_11:19:30.882+0000: [I] tscCheckTcpConn: Sending debug data > collection request to node 10.20.0.58 proto-pg-pf01.bear.cluster > 2019-01-17_11:19:30.882+0000: Sending request to collect TCP debug > data to proto-pg-pf01.bear.cluster localNode > 2019-01-17_11:19:30.882+0000: [I] Calling user exit script > gpfsSendRequestToNodes: event sendRequestToNodes, Async command > /usr/lpp/mmfs/bin/mmcommon. > 2019-01-17_11:24:52.611+0000: [E] Timed out in 300 seconds waiting for > a commMsgCheckMessages reply from node 10.20.0.58 > proto-pg-pf01.bear.cluster. Sending expel message. > > On the client node, we see messages of the form: > > 2019-01-17_11:19:31.101+0000: [N] sdrServ: Received Tcp data > collection request from 10.10.0.33 > 2019-01-17_11:19:31.102+0000: [N] GPFS will attempt to collect Tcp > debug data on this node. > 2019-01-17_11:24:52.838+0000: [N] sdrServ: Received expel data > collection request from 10.10.0.33 > 2019-01-17_11:24:52.838+0000: [N] GPFS will attempt to collect debug > data on this node. > 2019-01-17_11:25:02.741+0000: [N] This node will be expelled from > cluster rds.gpfs.servers due to expel msg from 10.10.12.41 (b > ber-les-nsd01-data.bb2.cluster in rds.gpfs.server > 2019-01-17_11:25:03.160+0000: [N] sdrServ: Received expel data > collection request from 10.20.0.56 > > They always appear to be to a specific type of hardware with the same > Ethernet controller, though the nodes are split across three data > centres and we aren?t seeing link congestion on the links between them. > > On the node I listed above, it?s not actually doing anything either as > the software on it is still being installed (i.e. it?s not doing GPFS > or any other IO other than a couple of home directories). > > Any suggestions on what ?(socket 153) state is unexpected? means? > > Thanks > > Simon > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org_ > __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Fri Jan 18 15:23:09 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 18 Jan 2019 15:23:09 +0000 Subject: [gpfsug-discuss] DSS-G Message-ID: <39dbe81fed83a72913decd11cc3f55e04ef5bb08.camel@strath.ac.uk> Anyone out their with a DSS-G using SR650 servers? We have one and after some hassle we have finally got the access to the software downloads and I have been reading through the documentation to familiarize myself with the upgrade procedure. Skipping over the shear madness of that which appears to involved doing a complete netboot reisntall of the nodes for every upgrade, it looks like we have wrong hardware. It all came in a Lenovo rack with factory cabling so one assumes it would be correct. However the "Manufactoring Preload Procedure" document says The DSS-G installation scripts assume that IPMI access to the servers is set up through the first regular 1GbE Ethernet port of the server (marked with a green star in figure 21) in shared mode, not through the dedicated IPMI port under the first three PCIe slots of the SR650 server?s back, and not on the lower left side of the x3650 M5 server?s back. Except our SR650's have 2x10GbE SFP+ LOM and the XCC is connected to the dedicated IPMI port. Oh great, reinstalling the OS for an update is already giving me the screaming heebie jeebies, but now my factory delivered setup is wrong. So in my book increased chance of the install procedure writing all over the disks during install and blowing away the NSD's. Last time I was involved in an net install of RHEL (well CentOS but makes little difference) onto a GPFS not with attached disks the installer wrote all over the NSD descriptors and destroyed the file system. So before one plays war with Lenovo for shipping an unsupported configuration I was wondering how other DSS-G's with SR650's have come from the factory. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From S.J.Thompson at bham.ac.uk Fri Jan 18 16:02:48 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 18 Jan 2019 16:02:48 +0000 Subject: [gpfsug-discuss] DSS-G In-Reply-To: <39dbe81fed83a72913decd11cc3f55e04ef5bb08.camel@strath.ac.uk> References: <39dbe81fed83a72913decd11cc3f55e04ef5bb08.camel@strath.ac.uk> Message-ID: <70C48D1B-4E99-4831-A9D9-AFD326154D8A@bham.ac.uk> I have several. One of mine was shipped for customer rack (which happened to be an existing Lenovo rack anyway), the other was based on 3560m5 so cabled differently then anyway (and its now a franken DSS-G as we upgraded the servers to SR650 and added an SSD tray, but I have so much non-standard Lenovo config stuff in our systems ....) If you bond the LOM ports together then you can't use the XCC in shared mode. But the installer scripts will make it shared when you reinstall/upgrade. Well, it can half work in some cases depending on how you have your switch connected. For example we set the switch to fail back to non-bond mode (relatively common now), which is find when the OS is not booted, you can talk to XCC. But as soon as the OS boots and it bonds, the switch port turns into a bond/trunk port and BAM, you can no longer talk to the XCC port. We have an xcat post script to put it back to being dedicated on the XCC port. So during install you lose access for a little while whilst the Lenovo script runs before my script puts it back again. And if you read the upgrade guide, then it tells you to unplug the SAS ports before doing the reinstall (OK I haven't checked the 2.2a upgrade guide, but it always did). HOWEVER, the xcat template for DSS-G should also black list the SAS driver to prevent it seeing the attached JBOD storage. AND GPFS now writes proper GPT headers as well to the disks which the installer should then leave alone. (But yes, haven't we all done an install and wiped the disk headers ... GPFS works great until you try to mount the file-system sometime later) On the needing to reinstall ... I agree I don't like the reinstall to upgrade between releases, but if you look what it's doing it sorta half makes sense. For example it force flashes an exact validated firmware onto the SAS cards and forces the port config etc onto the card to being in a known current state. I don't like it, but I see why it's done like that. We have in the past picked the relevant bits out (e.g. disk firmware and GPFS packages), and done just those, THIS IS NOT SUPPORTED, but we did pick it apart to see what had changed. If you go to 2.2a as well, the gui is now moved out (it was a bad idea to install on the DSS-G nodes anyway I'm sure), and the pmcollector package magically doesn't get installed either on the DSS-G nodes. Oh AND, the LOM ports ... if you upgrade to DSS-G 2.2a, that will flash the firmware to Intel 4.0 release for the X722. And that doesn't work if you have Mellanox Ethernet switches running Cumulus. (we proved it was the firmware by upgrading another SR650 to the latest firmware and suddenly it no longer works) - you won't get a link up, even at PXE time so not a driver issue. And if you have a VDX switch you need another workaround ... Simon ?On 18/01/2019, 15:38, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" wrote: Anyone out their with a DSS-G using SR650 servers? We have one and after some hassle we have finally got the access to the software downloads and I have been reading through the documentation to familiarize myself with the upgrade procedure. Skipping over the shear madness of that which appears to involved doing a complete netboot reisntall of the nodes for every upgrade, it looks like we have wrong hardware. It all came in a Lenovo rack with factory cabling so one assumes it would be correct. However the "Manufactoring Preload Procedure" document says The DSS-G installation scripts assume that IPMI access to the servers is set up through the first regular 1GbE Ethernet port of the server (marked with a green star in figure 21) in shared mode, not through the dedicated IPMI port under the first three PCIe slots of the SR650 server?s back, and not on the lower left side of the x3650 M5 server?s back. Except our SR650's have 2x10GbE SFP+ LOM and the XCC is connected to the dedicated IPMI port. Oh great, reinstalling the OS for an update is already giving me the screaming heebie jeebies, but now my factory delivered setup is wrong. So in my book increased chance of the install procedure writing all over the disks during install and blowing away the NSD's. Last time I was involved in an net install of RHEL (well CentOS but makes little difference) onto a GPFS not with attached disks the installer wrote all over the NSD descriptors and destroyed the file system. So before one plays war with Lenovo for shipping an unsupported configuration I was wondering how other DSS-G's with SR650's have come from the factory. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathan.buzzard at strath.ac.uk Fri Jan 18 17:14:52 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 18 Jan 2019 17:14:52 +0000 Subject: [gpfsug-discuss] DSS-G In-Reply-To: <70C48D1B-4E99-4831-A9D9-AFD326154D8A@bham.ac.uk> References: <39dbe81fed83a72913decd11cc3f55e04ef5bb08.camel@strath.ac.uk> <70C48D1B-4E99-4831-A9D9-AFD326154D8A@bham.ac.uk> Message-ID: <901117abe1768c9d02aae3b6cc9b5cf47dc3cc97.camel@strath.ac.uk> On Fri, 2019-01-18 at 16:02 +0000, Simon Thompson wrote: [SNIP] > > If you bond the LOM ports together then you can't use the XCC in > shared mode. But the installer scripts will make it shared when you > reinstall/upgrade. Well, it can half work in some cases depending on > how you have your switch connected. For example we set the switch to > fail back to non-bond mode (relatively common now), which is find > when the OS is not booted, you can talk to XCC. But as soon as the OS > boots and it bonds, the switch port turns into a bond/trunk port and > BAM, you can no longer talk to the XCC port. We don't have that issue :-) Currently there is nothing plugged into the LOM because we are using the Mellanox ConnectX4 card for bonded 40Gbps Ethernet to carry the GPFS traffic in the main with one of the ports on the two cards set to Infiniband so the storage can be mounted on an old cluster which only has 1Gb Ethernet (new cluster uses 10GbE networking to carry storage). However we have a shortage of 10GbE ports and the documentation says it should be 1GbE anyway, hence asking what Lenovo might have shipped to other people, as we have a disparity between what has been shipped and what the documentation says it should be like. [SNIP] > And if you read the upgrade guide, then it tells you to unplug the > SAS ports before doing the reinstall (OK I haven't checked the 2.2a > upgrade guide, but it always did). Well the 2.2a documentation does not say anything about that :-) I had basically decided however it was going to be necessary for safety purposes. While I do have a full backup of the file system I don't want to have to use it. > HOWEVER, the xcat template for DSS-G should also black list the SAS > driver to prevent it seeing the attached JBOD storage. AND GPFS now > writes proper GPT headers as well to the disks which the installer > should then leave alone. (But yes, haven't we all done an install and > wiped the disk headers ... GPFS works great until you try to mount > the file-system sometime later) Well I have never wiped my NSD's, just the numpty getting ready to prepare the CentOS6 upgrade for the cluster forgot to unzone the storage arrays (cluster had FC attached storage to all nodes for performance reasons, back in the day 4Gb FC was a lot cheaper than 10GbE and 1GbE was not fast enough) and wiped it for me :-( > On the needing to reinstall ... I agree I don't like the reinstall to > upgrade between releases, but if you look what it's doing it sorta > half makes sense. For example it force flashes an exact validated > firmware onto the SAS cards and forces the port config etc onto the > card to being in a known current state. I don't like it, but I see > why it's done like that. Except that does not require a reinstall of the OS to achieve. Reinstalling from scratch for an update is complete madness IMHO. > > If you go to 2.2a as well, the gui is now moved out (it was a bad > idea to install on the DSS-G nodes anyway I'm sure), and the > pmcollector package magically doesn't get installed either on the > DSS-G nodes. > Currently we don't have the GUI installed anywhere. I am not sure I trust IBM yet to not change the GUI completely again to be bothered getting it to work. > Oh AND, the LOM ports ... if you upgrade to DSS-G 2.2a, that will > flash the firmware to Intel 4.0 release for the X722. And that > doesn't work if you have Mellanox Ethernet switches running > Cumulus. (we proved it was the firmware by upgrading another SR650 > to the latest firmware and suddenly it no longer works) - you won't > get a link up, even at PXE time so not a driver issue. And if you > have a VDX switch you need another workaround ... > We have Lenovo switches, so hopefully Lenovo tested with their own switches work ;-) Mind you I get this running the dssgcktopology tool Warning: Unsupported configuration of odd number of enclosures detected. Which nitwit wrote that script then? From the "Manufacturing Preload Procedure" for 2.2a on page 9 For the high density DSS models DSS-G210, DSS-G220, DSS-G240 and DSS-G260 with 3.5? NL-SAS disks (7.2k RPM), the DSS-G building block contains one, two, four or six Lenovo D3284 disk enclosures. Right so what is it then? Because one enclosure which is clearly an odd number of enclosures is allegedly an unsupported configuration according to the tool, but supported according to the documentation!!! JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From matthew.robinson02 at gmail.com Fri Jan 18 19:25:35 2019 From: matthew.robinson02 at gmail.com (Matthew Robinson) Date: Fri, 18 Jan 2019 14:25:35 -0500 Subject: [gpfsug-discuss] DSS-G In-Reply-To: <39dbe81fed83a72913decd11cc3f55e04ef5bb08.camel@strath.ac.uk> References: <39dbe81fed83a72913decd11cc3f55e04ef5bb08.camel@strath.ac.uk> Message-ID: Hi Jonathan, In the last DSS 2.x tarballs there should a PDG included. This should provide alot of detail going over the solutions configuration and common problems for troubleshooting. Or at least the Problem Determantion Guide was there be for my department let me go. The shared IMM port is pretty standard from the 3650 to the SD530's for the most part. You should have a port marked shared on either and the IPMI interace is to be shared mode for dual subnet masks on the same NIC. This is is the standard xcat configuration from Sourcforge. If I am not mistaken the PDG should be stored in the first DSS-G version tarball for reference. Hope this helps, Matthew Robinson On Fri, Jan 18, 2019 at 10:23 AM Jonathan Buzzard < jonathan.buzzard at strath.ac.uk> wrote: > > Anyone out their with a DSS-G using SR650 servers? > > We have one and after some hassle we have finally got the access to the > software downloads and I have been reading through the documentation to > familiarize myself with the upgrade procedure. > > Skipping over the shear madness of that which appears to involved doing > a complete netboot reisntall of the nodes for every upgrade, it looks > like we have wrong hardware. It all came in a Lenovo rack with factory > cabling so one assumes it would be correct. > > However the "Manufactoring Preload Procedure" document says > > The DSS-G installation scripts assume that IPMI access to the > servers is set up through the first regular 1GbE Ethernet port > of the server (marked with a green star in figure 21) in shared > mode, not through the dedicated IPMI port under the first three > PCIe slots of the SR650 server?s back, and not on the lower left > side of the x3650 M5 server?s back. > > Except our SR650's have 2x10GbE SFP+ LOM and the XCC is connected to > the dedicated IPMI port. Oh great, reinstalling the OS for an update is > already giving me the screaming heebie jeebies, but now my factory > delivered setup is wrong. So in my book increased chance of the install > procedure writing all over the disks during install and blowing away > the NSD's. Last time I was involved in an net install of RHEL (well > CentOS but makes little difference) onto a GPFS not with attached disks > the installer wrote all over the NSD descriptors and destroyed the file > system. > > So before one plays war with Lenovo for shipping an unsupported > configuration I was wondering how other DSS-G's with SR650's have come > from the factory. > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Matthew Robinson Comptia A+, Net+ 919.909.0494 matthew.robinson02 at gmail.com The greatest discovery of my generation is that man can alter his life simply by altering his attitude of mind. - William James, Harvard Psychologist. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Mon Jan 21 15:59:29 2019 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Mon, 21 Jan 2019 15:59:29 +0000 Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Message-ID: <5b711884338047fbaf5a61005964652d@SMXRF105.msg.hukrf.de> Hello All, We test spectrum scale on an windows only Client-Cluster (remote mounted to a linux Cluster) but the execution of mm commands in cygwin is very slow. We have tried the following adjustments to increase the execution speed. * We have installed Cygwin Server as a service (cygserver-config). Unfortunately, this resulted in no faster execution. * Adaptation of the hosts file: 127.0.0.1 localhost cygdrive wpad to prevent any DNS problems when accessing ?/cygdrive/...? * Started them as Administrator All adjustments have so far not led to any improvement. Are there any hints to enhance the cmd execution time on windows (w2k12 actual used) Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From spectrumscale at kiranghag.com Mon Jan 21 16:03:13 2019 From: spectrumscale at kiranghag.com (KG) Date: Mon, 21 Jan 2019 21:33:13 +0530 Subject: [gpfsug-discuss] Dr site using full replication? Message-ID: Hi Folks Has anyone replicated scale node to a dr site by replicating boot disks and nsd ? The same hostnames and ip subnet would be available on the other site and cluster should be able to operate from any one location at a time. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon Jan 21 16:02:50 2019 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 21 Jan 2019 16:02:50 +0000 Subject: [gpfsug-discuss] Get list of filesets_without_runningmmlsfileset? In-Reply-To: References: <5DCBA252-33B7-4AC9-B2E3-10DA6881B1AA@vanderbilt.edu> Message-ID: <60451989-2E0B-4CF9-A6E2-BC0939169311@vanderbilt.edu> Hi All, I just wanted to follow up on this thread ? the only way I have found to obtain a list of filesets and their associated junction paths as a non-root user is via the REST API (and thanks to those who suggested that). However, AFAICT querying the REST API via a script would expose the username / password used to do so to anyone who bothered to look at the code, which would in turn allow a knowledgeable and curious user to query the REST API themselves for other information we do not necessarily want to expose to them. Therefore, it is not an acceptable solution to us. Therefore, unless someone responds with a way to allow a non-root user to obtain fileset junction paths that doesn?t involve the REST API, I?m afraid I?m at a dead end in terms of making our quota usage Python script something that I can share with the broader community. It just has too much site-specific code in it. Sorry? Kevin P.S. In case you?re curious about how the quota script is obtaining those junction paths ? we have a cron job that runs once per hour on the cluster manager that dumps the output of mmlsfileset to a text file, which the script then reads. The cron job used to just run once per day and used to just run mmlsfileset. I have modified it to be a shell script which checks for the load average on the cluster manager being less than 10 and that there are no waiters of more than 10 seconds duration. If both of those conditions are true, it runs mmlsfileset. If either are not, it simply exits ? the idea being that one or both of those would likely be true if something were going on with the cluster manager that would cause the mmlsfileset to hang. I have also modified the quota script itself so that it checks that the junction path for a fileset actually exists before attempting to stat it (duh - should?ve done that from the start), which handles the case where a user would run the quota script and it would bomb off with an exception because the fileset was deleted and the cron job hadn?t run yet. If a new fileset is created, well, it just won?t get checked by the quota script until the cron job runs successfully. We have decided that this is an acceptable compromise. On Jan 15, 2019, at 8:46 AM, Marc A Kaplan > wrote: Personally, I agree that there ought to be a way in the product. In the meawhile, you no doubt already have some ways to tell your users where to find their filesets as pathnames. Otherwise, how are they accessing their files? And to keep things somewhat sane, I'd bet filesets are all linked to one or small number of well known paths in the filesystem. Like /AGpfsFilesystem/filesets/... Plus you could add symlinks and/or as has been suggested post info extracted from mmlsfileset and/or mmlsquota. So as a practical matter, is this an urgent problem...? Why? How? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeverdon at us.ibm.com Mon Jan 21 22:41:26 2019 From: jeverdon at us.ibm.com (Jodi E Everdon) Date: Mon, 21 Jan 2019 17:41:26 -0500 Subject: [gpfsug-discuss] post to list Message-ID: Jodi Everdon IBM New Technology Introduction (NTI) 2455 South Road Client Experience Validation Poughkeepsie, NY 12601 Email: jeverdon at us.ibm.com North America IBM IT Infrastructure: www.ibm.com/it-infrastructure -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 15606074.gif Type: image/gif Size: 1851 bytes Desc: not available URL: From scale at us.ibm.com Mon Jan 21 23:34:31 2019 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Mon, 21 Jan 2019 15:34:31 -0800 Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays In-Reply-To: <5b711884338047fbaf5a61005964652d@SMXRF105.msg.hukrf.de> References: <5b711884338047fbaf5a61005964652d@SMXRF105.msg.hukrf.de> Message-ID: Hello Renar, A few things to try: Make sure IPv6 is disabled. On each Windows node, run "mmcmi host ", with being itself and each and every node in the cluster. Make sure mmcmi prints valid IPv4 address. To eliminate DNS issues, try adding IPv4 entries for each cluster node in "c:\windows\system32\drivers\etc\hosts". If any anti-virus is active, disable realtime scanning on c:\cygwin64 (wherever you installed cygwin 64-bit). You can also try debugging a script, say: (from GPFS ksh): DEBUG=1 mmlscluster, and see what takes time. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Grunenberg, Renar" To: "'gpfsug-discuss at spectrumscale.org'" Date: 01/21/2019 08:01 AM Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello All, We test spectrum scale on an windows only Client-Cluster (remote mounted to a linux Cluster) but the execution of mm commands in cygwin is very slow. We have tried the following adjustments to increase the execution speed. We have installed Cygwin Server as a service (cygserver-config). Unfortunately, this resulted in no faster execution. Adaptation of the hosts file: 127.0.0.1 localhost cygdrive wpad to prevent any DNS problems when accessing ?/cygdrive/...? Started them as Administrator All adjustments have so far not led to any improvement. Are there any hints to enhance the cmd execution time on windows (w2k12 actual used) Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=frR4WiYT89JSgLnJMtRAlESzRXWW2YatEwsuuV8M810&s=FSjMBxMo8G8y3VR2A59hgIWaHPKPFNHU7RXcneIVCPE&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From A.Wolf-Reber at de.ibm.com Tue Jan 22 07:36:15 2019 From: A.Wolf-Reber at de.ibm.com (Alexander Wolf) Date: Tue, 22 Jan 2019 07:36:15 +0000 Subject: [gpfsug-discuss] Get list of filesets_without_runningmmlsfileset? In-Reply-To: <60451989-2E0B-4CF9-A6E2-BC0939169311@vanderbilt.edu> References: <60451989-2E0B-4CF9-A6E2-BC0939169311@vanderbilt.edu>, <5DCBA252-33B7-4AC9-B2E3-10DA6881B1AA@vanderbilt.edu> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15481420128480.png Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15481420128481.png Type: image/png Size: 6645 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15481420128482.png Type: image/png Size: 1134 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Tue Jan 22 14:35:02 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 22 Jan 2019 14:35:02 +0000 Subject: [gpfsug-discuss] Node expels In-Reply-To: References: Message-ID: <0B0D4ACE-1B54-4D22-85E3-B3154DD7C943@bham.ac.uk> OK we think we might have a reason for this. We run iptables on some of our management function nodes, and we found that in some cases, our config management tool can cause a ?systemctl restart iptables? to occur (the rule ordering generation was non deterministic meaning it could shuffle rules ? we fixed that and made it reload rather than restart). Which takes a fraction of a second, but it appears that this is sufficient for GPFS to get into a state. What I didn?t mention before was that we could get it into a state where the only way to recover was to shutdown the storage cluster and restart it. I?m not sure why normal expel and recovery doesn?t appear to work in this case, though we?re not 100% certain that its iptables restart. (we just have a very smoky gun at present). (I have a ticket with that question open). Maybe it?s a combination of having a default DROP policy on iptables as well - we have also switched to ACCEPT and added a DROP rule at the end of the ruleset which gives the same result. Simon From: on behalf of "jlewars at us.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Thursday, 17 January 2019 at 14:31 To: Tomer Perry , "gpfsug-discuss at spectrumscale.org" Cc: Yong Ze Chen Subject: Re: [gpfsug-discuss] Node expels >They always appear to be to a specific type of hardware with the same Ethernet controller, That makes me think you might be seeing packet loss that could require ring buffer tuning (the defaults and limits will differ with different ethernet adapters). The expel section in the slides on this page has been expanded to include a 'debugging expels section' (slides 19-20, which also reference ring buffer tuning): https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/DEBUG%20Expels/comment/7e4f9433-7ca3-430f-b40b-94777c507381 Regards, John Lewars Spectrum Scale Performance, IBM Poughkeepsie From: Tomer Perry/Israel/IBM To: gpfsug main discussion list Cc: John Lewars/Poughkeepsie/IBM at IBMUS, Yong Ze Chen/China/IBM at IBMCN Date: 01/17/2019 08:28 AM Subject: Re: [gpfsug-discuss] Node expels ________________________________ Hi, I was asked to elaborate a bit ( thus also adding John and Yong Ze Chen). As written on the slide: One of the best ways to determine if a network layer problem is root cause for an expel is to look at the low-level socket details dumped in the ?extra? log data (mmfs dump all) saved as part of automatic data collection on Linux GPFS nodes. So, the idea is that in expel situation, we dump the socket state from the OS ( you can see the same using 'ss -i' for example). In your example, it shows that the ca_state is 4, there are retransmits, high rto and all the point to a network problem. You can find more details here: http://www.yonch.com/tech/linux-tcp-congestion-control-internals Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: "Tomer Perry" To: gpfsug main discussion list Date: 17/01/2019 13:46 Subject: Re: [gpfsug-discuss] Node expels Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Simon, Take a look at http://files.gpfsug.org/presentations/2018/USA/Scale_Network_Flow-0.8.pdfslide 13. Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 17/01/2019 13:35 Subject: [gpfsug-discuss] Node expels Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ We?ve recently been seeing quite a few node expels with messages of the form: 2019-01-17_11:19:30.882+0000: [W] The TCP connection to IP address 10.20.0.58 proto-pg-pf01.bear.cluster (socket 153) state is unexpected: state=1 ca_state=4 snd_cwnd=1 snd_ssthresh=5 unacked=5 probes=0 backoff=7 retransmits=7 rto=26496000 rcv_ssthresh=102828 rtt=6729 rttvar=12066 sacked=0 retrans=1 reordering=3 lost=5 2019-01-17_11:19:30.882+0000: [I] tscCheckTcpConn: Sending debug data collection request to node 10.20.0.58 proto-pg-pf01.bear.cluster 2019-01-17_11:19:30.882+0000: Sending request to collect TCP debug data to proto-pg-pf01.bear.cluster localNode 2019-01-17_11:19:30.882+0000: [I] Calling user exit script gpfsSendRequestToNodes: event sendRequestToNodes, Async command /usr/lpp/mmfs/bin/mmcommon. 2019-01-17_11:24:52.611+0000: [E] Timed out in 300 seconds waiting for a commMsgCheckMessages reply from node 10.20.0.58 proto-pg-pf01.bear.cluster. Sending expel message. On the client node, we see messages of the form: 2019-01-17_11:19:31.101+0000: [N] sdrServ: Received Tcp data collection request from 10.10.0.33 2019-01-17_11:19:31.102+0000: [N] GPFS will attempt to collect Tcp debug data on this node. 2019-01-17_11:24:52.838+0000: [N] sdrServ: Received expel data collection request from 10.10.0.33 2019-01-17_11:24:52.838+0000: [N] GPFS will attempt to collect debug data on this node. 2019-01-17_11:25:02.741+0000: [N] This node will be expelled from cluster rds.gpfs.servers due to expel msg from 10.10.12.41 (b ber-les-nsd01-data.bb2.cluster in rds.gpfs.server 2019-01-17_11:25:03.160+0000: [N] sdrServ: Received expel data collection request from 10.20.0.56 They always appear to be to a specific type of hardware with the same Ethernet controller, though the nodes are split across three data centres and we aren?t seeing link congestion on the links between them. On the node I listed above, it?s not actually doing anything either as the software on it is still being installed (i.e. it?s not doing GPFS or any other IO other than a couple of home directories). Any suggestions on what ?(socket 153) state is unexpected? means? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From rmoye at quantlab.com Tue Jan 22 15:43:26 2019 From: rmoye at quantlab.com (Roger Moye) Date: Tue, 22 Jan 2019 15:43:26 +0000 Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays In-Reply-To: References: <5b711884338047fbaf5a61005964652d@SMXRF105.msg.hukrf.de> Message-ID: <18bab23b080c4ad487c68b8ebc04b975@quantlab.com> We experienced the same issue and were advised not to use Windows for quorum nodes. We moved our Windows nodes into the storage cluster which was entirely Linux and that solved it. If this is not an option, perhaps adding some Linux nodes to your remote cluster as quorum nodes would help. -Roger From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of IBM Spectrum Scale Sent: Monday, January 21, 2019 5:35 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Hello Renar, A few things to try: 1. Make sure IPv6 is disabled. On each Windows node, run "mmcmi host ", with being itself and each and every node in the cluster. Make sure mmcmi prints valid IPv4 address. 1. To eliminate DNS issues, try adding IPv4 entries for each cluster node in "c:\windows\system32\drivers\etc\hosts". 1. If any anti-virus is active, disable realtime scanning on c:\cygwin64 (wherever you installed cygwin 64-bit). You can also try debugging a script, say: (from GPFS ksh): DEBUG=1 mmlscluster, and see what takes time. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Grunenberg, Renar" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 01/21/2019 08:01 AM Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello All, We test spectrum scale on an windows only Client-Cluster (remote mounted to a linux Cluster) but the execution of mm commands in cygwin is very slow. We have tried the following adjustments to increase the execution speed. * We have installed Cygwin Server as a service (cygserver-config). Unfortunately, this resulted in no faster execution. * Adaptation of the hosts file: 127.0.0.1localhost cygdrive wpad to prevent any DNS problems when accessing "/cygdrive/..." * Started them as Administrator All adjustments have so far not led to any improvement. Are there any hints to enhance the cmd execution time on windows (w2k12 actual used) Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ----------------------------------------------------------------------------------- The information in this communication and any attachment is confidential and intended solely for the attention and use of the named addressee(s). All information and opinions expressed herein are subject to change without notice. This communication is not to be construed as an offer to sell or the solicitation of an offer to buy any security. Any such offer or solicitation can only be made by means of the delivery of a confidential private offering memorandum (which should be carefully reviewed for a complete description of investment strategies and risks). Any reliance one may place on the accuracy or validity of this information is at their own risk. Past performance is not necessarily indicative of the future results of an investment. All figures are estimated and unaudited unless otherwise noted. If you are not the intended recipient, or a person responsible for delivering this to the intended recipient, you are not authorized to and must not disclose, copy, distribute, or retain this message or any part of it. In this case, please notify the sender immediately at 713-333-5440 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Tue Jan 22 17:10:24 2019 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Tue, 22 Jan 2019 17:10:24 +0000 Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays In-Reply-To: <18bab23b080c4ad487c68b8ebc04b975@quantlab.com> References: <5b711884338047fbaf5a61005964652d@SMXRF105.msg.hukrf.de> <18bab23b080c4ad487c68b8ebc04b975@quantlab.com> Message-ID: Hallo Roger, first thanks fort he tip. But we decided to separate the linux-io-Cluster from the Windows client only cluster, because of security requirements and ssh management requirements. We can use at this point, local named admins on Windows and use on Linux a Deamon and an separated Admin-interface Network for pwless root ssh. Your Hint seems to be CCR related or is this a Cygwin problem. @Spectrum Scale Team: Point1: IP V6 can?t disabled because of applications that want to use this. But the mmcmi cmd are give us already the right ipv4 adresses. Point2. There are no DNS-Issues Point3: We must check these. Any recommendations to Rogers statements? Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Roger Moye Gesendet: Dienstag, 22. Januar 2019 16:43 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays We experienced the same issue and were advised not to use Windows for quorum nodes. We moved our Windows nodes into the storage cluster which was entirely Linux and that solved it. If this is not an option, perhaps adding some Linux nodes to your remote cluster as quorum nodes would help. -Roger From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of IBM Spectrum Scale Sent: Monday, January 21, 2019 5:35 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Hello Renar, A few things to try: 1. Make sure IPv6 is disabled. On each Windows node, run "mmcmi host ", with being itself and each and every node in the cluster. Make sure mmcmi prints valid IPv4 address. 1. To eliminate DNS issues, try adding IPv4 entries for each cluster node in "c:\windows\system32\drivers\etc\hosts". 1. If any anti-virus is active, disable realtime scanning on c:\cygwin64 (wherever you installed cygwin 64-bit). You can also try debugging a script, say: (from GPFS ksh): DEBUG=1 mmlscluster, and see what takes time. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Grunenberg, Renar" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 01/21/2019 08:01 AM Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello All, We test spectrum scale on an windows only Client-Cluster (remote mounted to a linux Cluster) but the execution of mm commands in cygwin is very slow. We have tried the following adjustments to increase the execution speed. * We have installed Cygwin Server as a service (cygserver-config). Unfortunately, this resulted in no faster execution. * Adaptation of the hosts file: 127.0.0.1localhost cygdrive wpad to prevent any DNS problems when accessing ?/cygdrive/...? * Started them as Administrator All adjustments have so far not led to any improvement. Are there any hints to enhance the cmd execution time on windows (w2k12 actual used) Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ----------------------------------------------------------------------------------- The information in this communication and any attachment is confidential and intended solely for the attention and use of the named addressee(s). All information and opinions expressed herein are subject to change without notice. This communication is not to be construed as an offer to sell or the solicitation of an offer to buy any security. Any such offer or solicitation can only be made by means of the delivery of a confidential private offering memorandum (which should be carefully reviewed for a complete description of investment strategies and risks). Any reliance one may place on the accuracy or validity of this information is at their own risk. Past performance is not necessarily indicative of the future results of an investment. All figures are estimated and unaudited unless otherwise noted. If you are not the intended recipient, or a person responsible for delivering this to the intended recipient, you are not authorized to and must not disclose, copy, distribute, or retain this message or any part of it. In this case, please notify the sender immediately at 713-333-5440 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Achim.Rehor at de.ibm.com Tue Jan 22 18:18:03 2019 From: Achim.Rehor at de.ibm.com (Achim Rehor) Date: Tue, 22 Jan 2019 19:18:03 +0100 Subject: [gpfsug-discuss] Node expels In-Reply-To: <0B0D4ACE-1B54-4D22-85E3-B3154DD7C943@bham.ac.uk> References: <0B0D4ACE-1B54-4D22-85E3-B3154DD7C943@bham.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 7182 bytes Desc: not available URL: From Renar.Grunenberg at huk-coburg.de Wed Jan 23 12:45:39 2019 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Wed, 23 Jan 2019 12:45:39 +0000 Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays In-Reply-To: References: <5b711884338047fbaf5a61005964652d@SMXRF105.msg.hukrf.de> <18bab23b080c4ad487c68b8ebc04b975@quantlab.com> Message-ID: <349cb338583a4c1d996677837fc65b6e@SMXRF105.msg.hukrf.de> Hallo All, as a point to the problem, it seems to be that all the delayes are happening here DEBUG=1 mmgetstate ?a ??.. /bin/rm -f /var/mmfs/gen/mmsdrfs.1256 /var/mmfs/tmp/allClusterNodes.mmgetstate.1256 /var/mmfs/tmp/allQuorumNodes.mmgetstate.1256 /var/mmfs/tmp/allNonQuorumNodes.mmgetstate.1256 /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.pub /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.priv /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.cert /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.keystore /var/mmfs/tmp/nodefile.mmgetstate.1256 /var/mmfs/tmp/diskfile.mmgetstate.1256 /var/mmfs/tmp/diskNamesFile.mmgetstate.1256 Any points to this it will be fixed in the near future are welcome. Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: Grunenberg, Renar Gesendet: Dienstag, 22. Januar 2019 18:10 An: 'gpfsug main discussion list' Betreff: AW: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Hallo Roger, first thanks fort he tip. But we decided to separate the linux-io-Cluster from the Windows client only cluster, because of security requirements and ssh management requirements. We can use at this point, local named admins on Windows and use on Linux a Deamon and an separated Admin-interface Network for pwless root ssh. Your Hint seems to be CCR related or is this a Cygwin problem. @Spectrum Scale Team: Point1: IP V6 can?t disabled because of applications that want to use this. But the mmcmi cmd are give us already the right ipv4 adresses. Point2. There are no DNS-Issues Point3: We must check these. Any recommendations to Rogers statements? Regards Renar Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Roger Moye Gesendet: Dienstag, 22. Januar 2019 16:43 An: gpfsug main discussion list > Betreff: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays We experienced the same issue and were advised not to use Windows for quorum nodes. We moved our Windows nodes into the storage cluster which was entirely Linux and that solved it. If this is not an option, perhaps adding some Linux nodes to your remote cluster as quorum nodes would help. -Roger From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of IBM Spectrum Scale Sent: Monday, January 21, 2019 5:35 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Hello Renar, A few things to try: 1. Make sure IPv6 is disabled. On each Windows node, run "mmcmi host ", with being itself and each and every node in the cluster. Make sure mmcmi prints valid IPv4 address. 1. To eliminate DNS issues, try adding IPv4 entries for each cluster node in "c:\windows\system32\drivers\etc\hosts". 1. If any anti-virus is active, disable realtime scanning on c:\cygwin64 (wherever you installed cygwin 64-bit). You can also try debugging a script, say: (from GPFS ksh): DEBUG=1 mmlscluster, and see what takes time. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Grunenberg, Renar" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 01/21/2019 08:01 AM Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello All, We test spectrum scale on an windows only Client-Cluster (remote mounted to a linux Cluster) but the execution of mm commands in cygwin is very slow. We have tried the following adjustments to increase the execution speed. * We have installed Cygwin Server as a service (cygserver-config). Unfortunately, this resulted in no faster execution. * Adaptation of the hosts file: 127.0.0.1localhost cygdrive wpad to prevent any DNS problems when accessing ?/cygdrive/...? * Started them as Administrator All adjustments have so far not led to any improvement. Are there any hints to enhance the cmd execution time on windows (w2k12 actual used) Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ----------------------------------------------------------------------------------- The information in this communication and any attachment is confidential and intended solely for the attention and use of the named addressee(s). All information and opinions expressed herein are subject to change without notice. This communication is not to be construed as an offer to sell or the solicitation of an offer to buy any security. Any such offer or solicitation can only be made by means of the delivery of a confidential private offering memorandum (which should be carefully reviewed for a complete description of investment strategies and risks). Any reliance one may place on the accuracy or validity of this information is at their own risk. Past performance is not necessarily indicative of the future results of an investment. All figures are estimated and unaudited unless otherwise noted. If you are not the intended recipient, or a person responsible for delivering this to the intended recipient, you are not authorized to and must not disclose, copy, distribute, or retain this message or any part of it. In this case, please notify the sender immediately at 713-333-5440 -------------- next part -------------- An HTML attachment was scrubbed... URL: From heiner.billich at psi.ch Thu Jan 24 14:29:42 2019 From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI)) Date: Thu, 24 Jan 2019 14:29:42 +0000 Subject: [gpfsug-discuss] does ganesha deny access for unknown UIDs? Message-ID: <44051794-8F45-4725-92E0-09729474E7A1@psi.ch> Hello, a local account on a nfs client couldn?t write to a ganesha nfs export even with directory permissions 777. The solution was to create the account on the ganesha servers, too. Please can you confirm that this is the intended behaviour? is there an option to change this and to map unknown accounts to nobody instead? We often have embedded Linux appliances or similar as nfs clients which need to place some data on the nfs exports using uid/gid of local accounts. We manage gids on the server side and allow NFS v3 client access only. I crosspost this to ganesha support and to the gpfsug mailing list. Thank you, Heiner Billich ganesha version: 2.5.3-ibm028.00.el7.x86_64 the ganesha config CacheInode { fd_hwmark_percent=60; fd_lwmark_percent=20; fd_limit_percent=90; lru_run_interval=90; entries_hwmark=1500000; } NFS_Core_Param { clustered=TRUE; rpc_max_connections=10000; heartbeat_freq=0; mnt_port=33247; nb_worker=256; nfs_port=2049; nfs_protocols=3,4; nlm_port=33245; rquota_port=33246; rquota_port=33246; short_file_handle=FALSE; mount_path_pseudo=true; } GPFS { fsal_grace=FALSE; fsal_trace=TRUE; } NFSv4 { delegations=FALSE; domainname=virtual1.com; grace_period=60; lease_lifetime=60; } Export_Defaults { access_type=none; anonymous_gid=-2; anonymous_uid=-2; manage_gids=TRUE; nfs_commit=FALSE; privilegedport=FALSE; protocols=3,4; sectype=sys; squash=root_squash; transports=TCP; } one export # === START /**** id=206 nclients=3 === EXPORT { Attr_Expiration_Time=60; Delegations=none; Export_id=206; Filesystem_id=42.206; MaxOffsetRead=18446744073709551615; MaxOffsetWrite=18446744073709551615; MaxRead=1048576; MaxWrite=1048576; Path="/****"; PrefRead=1048576; PrefReaddir=1048576; PrefWrite=1048576; Pseudo="/****"; Tag="****"; UseCookieVerifier=false; FSAL { Name=GPFS; } CLIENT { # === ****/X12SA === Access_Type=RW; Anonymous_gid=-2; Anonymous_uid=-2; Clients=X.Y.A.B/24; Delegations=none; Manage_Gids=TRUE; NFS_Commit=FALSE; PrivilegedPort=FALSE; Protocols=3; SecType=SYS; Squash=Root; Transports=TCP; } ?. -- Paul Scherrer Institut Heiner Billich System Engineer Scientific Computing Science IT / High Performance Computing WHGA/106 Forschungsstrasse 111 5232 Villigen PSI Switzerland Phone +41 56 310 36 02 heiner.billich at psi.ch https://www.psi.ch -------------- next part -------------- An HTML attachment was scrubbed... URL: From truongv at us.ibm.com Thu Jan 24 18:17:45 2019 From: truongv at us.ibm.com (Truong Vu) Date: Thu, 24 Jan 2019 13:17:45 -0500 Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays In-Reply-To: References: Message-ID: Hi Renar, Let's see if it is really the /bin/rm is the problem here. Can you run the command again without cleanup the temp files as follow: DEBUG=1 keepTempFiles=1 mmgetstate -a Thanks, Tru. From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 01/23/2019 07:46 AM Subject: gpfsug-discuss Digest, Vol 84, Issue 32 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=jDHxo2hE5uOrH5xaI6YYQdQ-O5yZG-udF7ooPNOEUUM&s=UBffyp1tO8WZsaCys72XHljL9SyUe_v4ECCmymP17Lg&e= or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: Spectrum Scale Cygwin cmd delays (Grunenberg, Renar) ---------------------------------------------------------------------- Message: 1 Date: Wed, 23 Jan 2019 12:45:39 +0000 From: "Grunenberg, Renar" To: 'gpfsug main discussion list' Subject: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Message-ID: <349cb338583a4c1d996677837fc65b6e at SMXRF105.msg.hukrf.de> Content-Type: text/plain; charset="utf-8" Hallo All, as a point to the problem, it seems to be that all the delayes are happening here DEBUG=1 mmgetstate ?a ??.. /bin/rm -f /var/mmfs/gen/mmsdrfs.1256 /var/mmfs/tmp/allClusterNodes.mmgetstate.1256 /var/mmfs/tmp/allQuorumNodes.mmgetstate.1256 /var/mmfs/tmp/allNonQuorumNodes.mmgetstate.1256 /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.pub /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.priv /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.cert /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.keystore /var/mmfs/tmp/nodefile.mmgetstate.1256 /var/mmfs/tmp/diskfile.mmgetstate.1256 /var/mmfs/tmp/diskNamesFile.mmgetstate.1256 Any points to this it will be fixed in the near future are welcome. Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: Grunenberg, Renar Gesendet: Dienstag, 22. Januar 2019 18:10 An: 'gpfsug main discussion list' Betreff: AW: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Hallo Roger, first thanks fort he tip. But we decided to separate the linux-io-Cluster from the Windows client only cluster, because of security requirements and ssh management requirements. We can use at this point, local named admins on Windows and use on Linux a Deamon and an separated Admin-interface Network for pwless root ssh. Your Hint seems to be CCR related or is this a Cygwin problem. @Spectrum Scale Team: Point1: IP V6 can?t disabled because of applications that want to use this. But the mmcmi cmd are give us already the right ipv4 adresses. Point2. There are no DNS-Issues Point3: We must check these. Any recommendations to Rogers statements? Regards Renar Von: gpfsug-discuss-bounces at spectrumscale.org< mailto:gpfsug-discuss-bounces at spectrumscale.org> [ mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Roger Moye Gesendet: Dienstag, 22. Januar 2019 16:43 An: gpfsug main discussion list > Betreff: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays We experienced the same issue and were advised not to use Windows for quorum nodes. We moved our Windows nodes into the storage cluster which was entirely Linux and that solved it. If this is not an option, perhaps adding some Linux nodes to your remote cluster as quorum nodes would help. -Roger From: gpfsug-discuss-bounces at spectrumscale.org< mailto:gpfsug-discuss-bounces at spectrumscale.org> [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of IBM Spectrum Scale Sent: Monday, January 21, 2019 5:35 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Hello Renar, A few things to try: 1. Make sure IPv6 is disabled. On each Windows node, run "mmcmi host ", with being itself and each and every node in the cluster. Make sure mmcmi prints valid IPv4 address. 1. To eliminate DNS issues, try adding IPv4 entries for each cluster node in "c:\windows\system32\drivers\etc\hosts". 1. If any anti-virus is active, disable realtime scanning on c:\cygwin64 (wherever you installed cygwin 64-bit). You can also try debugging a script, say: (from GPFS ksh): DEBUG=1 mmlscluster, and see what takes time. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Grunenberg, Renar" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 01/21/2019 08:01 AM Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Sent by: gpfsug-discuss-bounces at spectrumscale.org< mailto:gpfsug-discuss-bounces at spectrumscale.org> ________________________________ Hello All, We test spectrum scale on an windows only Client-Cluster (remote mounted to a linux Cluster) but the execution of mm commands in cygwin is very slow. We have tried the following adjustments to increase the execution speed. * We have installed Cygwin Server as a service (cygserver-config). Unfortunately, this resulted in no faster execution. * Adaptation of the hosts file: 127.0.0.1localhost cygdrive wpad to prevent any DNS problems when accessing ?/cygdrive/...? * Started them as Administrator All adjustments have so far not led to any improvement. Are there any hints to enhance the cmd execution time on windows (w2k12 actual used) Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=jDHxo2hE5uOrH5xaI6YYQdQ-O5yZG-udF7ooPNOEUUM&s=UBffyp1tO8WZsaCys72XHljL9SyUe_v4ECCmymP17Lg&e= ----------------------------------------------------------------------------------- The information in this communication and any attachment is confidential and intended solely for the attention and use of the named addressee(s). All information and opinions expressed herein are subject to change without notice. This communication is not to be construed as an offer to sell or the solicitation of an offer to buy any security. Any such offer or solicitation can only be made by means of the delivery of a confidential private offering memorandum (which should be carefully reviewed for a complete description of investment strategies and risks). Any reliance one may place on the accuracy or validity of this information is at their own risk. Past performance is not necessarily indicative of the future results of an investment. All figures are estimated and unaudited unless otherwise noted. If you are not the intended recipient, or a person responsible for delivering this to the intended recipient, you are not authorized to and must not disclose, copy, distribute, o r retain this message or any part of it. In this case, please notify the sender immediately at 713-333-5440 -------------- next part -------------- An HTML attachment was scrubbed... URL: < https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_pipermail_gpfsug-2Ddiscuss_attachments_20190123_eff7ad74_attachment.html&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=jDHxo2hE5uOrH5xaI6YYQdQ-O5yZG-udF7ooPNOEUUM&s=JWv1FytE6pkOdJtqJV5sSVf3ZwV0B9FDZmfzI7LQEGk&e= > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=jDHxo2hE5uOrH5xaI6YYQdQ-O5yZG-udF7ooPNOEUUM&s=UBffyp1tO8WZsaCys72XHljL9SyUe_v4ECCmymP17Lg&e= End of gpfsug-discuss Digest, Vol 84, Issue 32 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From heiner.billich at psi.ch Fri Jan 25 09:13:53 2019 From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI)) Date: Fri, 25 Jan 2019 09:13:53 +0000 Subject: [gpfsug-discuss] [NFS-Ganesha-Support] does ganesha deny access for unknown UIDs? In-Reply-To: <35897363-6096-89e9-d22c-ba97ad10c26f@redhat.com> References: <44051794-8F45-4725-92E0-09729474E7A1@psi.ch> <35897363-6096-89e9-d22c-ba97ad10c26f@redhat.com> Message-ID: <1F7557E9-FE60-4F37-BA0A-FD4C37E124BD@psi.ch> Hello Daniel, thank you. The clients do NFS v3 mounts, hence idmap is no option - as I know it's used in NFS v4 to map between uid/guid and names only? For a process to switch to a certain uid/guid in general one does not need a matching passwd entry? I see that with ACLs you get issues as they use names, and you can't do a server-side group membership lookup, and there may be more subtle issues. Anyway, I'll create the needed accounts on the server. By the way: We had the same issue with Netapp filers and it took a while to find the configuration option to allow 'unknown' uid/gid to access a nfs v3 export. I'll try to reproduce on a test system with increased logging to see what exactly goes wrong and maybe ask later to add a configuration option to ganesha to switch to a behaviour more similar to kernel-nfs. Many client systems at my site are legacy and run various operating systems, hence a complete switch to NFS v4 is unlikely to happen soon. cheers, Heiner -- Paul Scherrer Institut Heiner Billich System Engineer Scientific Computing Science IT / High Performance Computing WHGA/106 Forschungsstrasse 111 5232 Villigen PSI Switzerland Phone +41 56 310 36 02 heiner.billich at psi.ch https://www.psi.ch ?On 24/01/19 16:35, "Daniel Gryniewicz" wrote: Hi. For local operating FSALs (like GPFS and VFS), the way Ganesha makes sure that a UID/GID combo has the correct permissions for an operation is to set the UID/GID of the thread to the one in the operation, then perform the actual operation. This way, the kernel and the underlying filesystem perform atomic permission checking on the op. This setuid/setgid will fail, of course, if the local system doesn't have that UID/GID to set to. The solution for this is to use NFS idmap to map the remote ID to a local one. This includes the ability to map unknown IDs to some local ID. Daniel On 1/24/19 9:29 AM, Billich Heinrich Rainer (PSI) wrote: > Hello, > > a local account on a nfs client couldn?t write to a ganesha nfs export > even with directory permissions 777. The solution was to create the > account on the ganesha servers, too. > > Please can you confirm that this is the intended behaviour? is there an > option to change this and to map unknown accounts to nobody instead? We > often have embedded Linux appliances or similar as nfs clients which > need to place some data on the nfs exports using uid/gid of local accounts. > > We manage gids on the server side and allow NFS v3 client access only. > > I crosspost this to ganesha support and to the gpfsug mailing list. > > Thank you, > > Heiner Billich > > ganesha version: 2.5.3-ibm028.00.el7.x86_64 From andy_kurth at ncsu.edu Fri Jan 25 16:08:12 2019 From: andy_kurth at ncsu.edu (Andy Kurth) Date: Fri, 25 Jan 2019 11:08:12 -0500 Subject: [gpfsug-discuss] does ganesha deny access for unknown UIDs? In-Reply-To: <44051794-8F45-4725-92E0-09729474E7A1@psi.ch> References: <44051794-8F45-4725-92E0-09729474E7A1@psi.ch> Message-ID: I believe this is occurring because of the manage_gids=TRUE setting. The purpose of this setting is to overcome the AUTH_SYS 16 group limit. If true, Ganesha takes the UID and resolves all of the GIDs on the server. If false, the GIDs sent by the client are used. I ran a quick test by creating a local user on the client and exporting 2 shares with 777 permissions, one with manage_gids=TRUE and one with FALSE. The user could view the share and create files with manage_gids=FALSE. ganesha.log showed that it tried and failed to resolve the UID to a name, but allowed the operation nonetheless: 2019-01-25 10:19:03 : epoch 0001004c : gpfs-proto.domain : ganesha.nfsd-123297[work-30] xdr_encode_nfs4_princ :ID MAPPER :INFO :nfs4_uid_to_name failed with code -2. 2019-01-25 10:19:03 : epoch 0001004c : gpfs-proto.domain : ganesha.nfsd-123297[work-30] xdr_encode_nfs4_princ :ID MAPPER :INFO :Lookup for 779 failed, using numeric owner With manage_gids=TRUE, the client received permission denied and ganesha.log showed the GID query failing: 2019-01-25 10:19:27 : epoch 0001004c : gpfs-proto.domain : ganesha.nfsd-123297[work-39] uid2grp_allocate_by_uid :ID MAPPER :INFO :No matching password record found for uid 779 2019-01-25 10:19:27 : epoch 0001004c : gpfs-proto.domain : ganesha.nfsd-123297[work-39] nfs_req_creds :DISP :INFO :Attempt to fetch managed_gids failed Hope this helps, Andy Kurth / NC State University On Thu, Jan 24, 2019 at 9:36 AM Billich Heinrich Rainer (PSI) < heiner.billich at psi.ch> wrote: > Hello, > > > > a local account on a nfs client couldn?t write to a ganesha nfs export > even with directory permissions 777. The solution was to create the account > on the ganesha servers, too. > > > > Please can you confirm that this is the intended behaviour? is there an > option to change this and to map unknown accounts to nobody instead? We > often have embedded Linux appliances or similar as nfs clients which need > to place some data on the nfs exports using uid/gid of local accounts. > > > > We manage gids on the server side and allow NFS v3 client access only. > > > > I crosspost this to ganesha support and to the gpfsug mailing list. > > > > Thank you, > > > > Heiner Billich > > > > ganesha version: 2.5.3-ibm028.00.el7.x86_64 > > > > the ganesha config > > > > CacheInode > > { > > fd_hwmark_percent=60; > > fd_lwmark_percent=20; > > fd_limit_percent=90; > > lru_run_interval=90; > > entries_hwmark=1500000; > > } > > NFS_Core_Param > > { > > clustered=TRUE; > > rpc_max_connections=10000; > > heartbeat_freq=0; > > mnt_port=33247; > > nb_worker=256; > > nfs_port=2049; > > nfs_protocols=3,4; > > nlm_port=33245; > > rquota_port=33246; > > rquota_port=33246; > > short_file_handle=FALSE; > > mount_path_pseudo=true; > > } > > GPFS > > { > > fsal_grace=FALSE; > > fsal_trace=TRUE; > > } > > NFSv4 > > { > > delegations=FALSE; > > domainname=virtual1.com; > > grace_period=60; > > lease_lifetime=60; > > } > > Export_Defaults > > { > > access_type=none; > > anonymous_gid=-2; > > anonymous_uid=-2; > > manage_gids=TRUE; > > nfs_commit=FALSE; > > privilegedport=FALSE; > > protocols=3,4; > > sectype=sys; > > squash=root_squash; > > transports=TCP; > > } > > > > one export > > > > # === START /**** id=206 nclients=3 === > > EXPORT { > > Attr_Expiration_Time=60; > > Delegations=none; > > Export_id=206; > > Filesystem_id=42.206; > > MaxOffsetRead=18446744073709551615; > > MaxOffsetWrite=18446744073709551615; > > MaxRead=1048576; > > MaxWrite=1048576; > > Path="/****"; > > PrefRead=1048576; > > PrefReaddir=1048576; > > PrefWrite=1048576; > > Pseudo="/****"; > > Tag="****"; > > UseCookieVerifier=false; > > FSAL { > > Name=GPFS; > > } > > CLIENT { > > # === ****/X12SA === > > Access_Type=RW; > > Anonymous_gid=-2; > > Anonymous_uid=-2; > > Clients=X.Y.A.B/24; > > Delegations=none; > > Manage_Gids=TRUE; > > NFS_Commit=FALSE; > > PrivilegedPort=FALSE; > > Protocols=3; > > SecType=SYS; > > Squash=Root; > > Transports=TCP; > > } > > ?. > > -- > > Paul Scherrer Institut > > Heiner Billich > > System Engineer Scientific Computing > > Science IT / High Performance Computing > > WHGA/106 > > Forschungsstrasse 111 > > 5232 Villigen PSI > > Switzerland > > > > Phone +41 56 310 36 02 > > heiner.billich at psi.ch > > https://www.psi.ch > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- *Andy Kurth* Research Storage Specialist NC State University Office of Information Technology P: 919-513-4090 311A Hillsborough Building Campus Box 7109 Raleigh, NC 27695 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Fri Jan 25 18:07:06 2019 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Fri, 25 Jan 2019 18:07:06 +0000 Subject: [gpfsug-discuss] FW: 'Flash (Alert): IBM Spectrum Scale (GPFS) V4.1.1.0 through 5.0.1.1: a read from or write to a DMAPI-migrated file may result in undetected data corruption or... Message-ID: [cid:forums.png] gpfs at us.ibm.com created a topic named Flash (Alert): IBM Spectrum Scale (GPFS) V4.1.1.0 through 5.0.1.1: a read from or write to a DMAPI-migrated file may result in undetected data corruption or a recall failure in the General Parallel File System - Announce (GPFS - Announce) forum. Abstract IBM has identified a problem in IBM Spectrum Scale V4.1.1.0 through 5.0.1.1, in which under some conditions reading a DMAPI-migrated file may return zeroes instead of the actual data. Further, a DMAPI-migrate operation or writing to a DMAPI-migrated file may cause the size of the stub file to be updated incorrectly, which may cause a mismatch between the file size recorded in the stub file and in the migrated object. This may result in failure of a manual or transparent recall, when triggered by a subsequent read from or write to the file. See the complete bulletin at: http://www.ibm.com/support/docview.wss?uid=ibm10741243 Open this item Posting Date: Friday, January 25, 2019 at 11:31:20 AM EST To unsubscribe or change settings, please go to your developerWorks community Settings. This is a notification sent from developerWorks community. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 289 bytes Desc: image001.png URL: From S.J.Thompson at bham.ac.uk Fri Jan 25 18:28:27 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 25 Jan 2019 18:28:27 +0000 Subject: [gpfsug-discuss] does ganesha deny access for unknown UIDs? In-Reply-To: References: <44051794-8F45-4725-92E0-09729474E7A1@psi.ch>, Message-ID: Note there are other limitations introduced by setting manage_gids. Whilst you get round the 16 group limit, instead ACLs are not properly interpreted to provide user access when an ACL is in place. In a PMR were told the only was around this would be to user sec_krb. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Andy Kurth [andy_kurth at ncsu.edu] Sent: 25 January 2019 16:08 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] does ganesha deny access for unknown UIDs? I believe this is occurring because of the manage_gids=TRUE setting. The purpose of this setting is to overcome the AUTH_SYS 16 group limit. If true, Ganesha takes the UID and resolves all of the GIDs on the server. If false, the GIDs sent by the client are used. I ran a quick test by creating a local user on the client and exporting 2 shares with 777 permissions, one with manage_gids=TRUE and one with FALSE. The user could view the share and create files with manage_gids=FALSE. ganesha.log showed that it tried and failed to resolve the UID to a name, but allowed the operation nonetheless: 2019-01-25 10:19:03 : epoch 0001004c : gpfs-proto.domain : ganesha.nfsd-123297[work-30] xdr_encode_nfs4_princ :ID MAPPER :INFO :nfs4_uid_to_name failed with code -2. 2019-01-25 10:19:03 : epoch 0001004c : gpfs-proto.domain : ganesha.nfsd-123297[work-30] xdr_encode_nfs4_princ :ID MAPPER :INFO :Lookup for 779 failed, using numeric owner With manage_gids=TRUE, the client received permission denied and ganesha.log showed the GID query failing: 2019-01-25 10:19:27 : epoch 0001004c : gpfs-proto.domain : ganesha.nfsd-123297[work-39] uid2grp_allocate_by_uid :ID MAPPER :INFO :No matching password record found for uid 779 2019-01-25 10:19:27 : epoch 0001004c : gpfs-proto.domain : ganesha.nfsd-123297[work-39] nfs_req_creds :DISP :INFO :Attempt to fetch managed_gids failed Hope this helps, Andy Kurth / NC State University On Thu, Jan 24, 2019 at 9:36 AM Billich Heinrich Rainer (PSI) > wrote: Hello, a local account on a nfs client couldn?t write to a ganesha nfs export even with directory permissions 777. The solution was to create the account on the ganesha servers, too. Please can you confirm that this is the intended behaviour? is there an option to change this and to map unknown accounts to nobody instead? We often have embedded Linux appliances or similar as nfs clients which need to place some data on the nfs exports using uid/gid of local accounts. We manage gids on the server side and allow NFS v3 client access only. I crosspost this to ganesha support and to the gpfsug mailing list. Thank you, Heiner Billich ganesha version: 2.5.3-ibm028.00.el7.x86_64 the ganesha config CacheInode { fd_hwmark_percent=60; fd_lwmark_percent=20; fd_limit_percent=90; lru_run_interval=90; entries_hwmark=1500000; } NFS_Core_Param { clustered=TRUE; rpc_max_connections=10000; heartbeat_freq=0; mnt_port=33247; nb_worker=256; nfs_port=2049; nfs_protocols=3,4; nlm_port=33245; rquota_port=33246; rquota_port=33246; short_file_handle=FALSE; mount_path_pseudo=true; } GPFS { fsal_grace=FALSE; fsal_trace=TRUE; } NFSv4 { delegations=FALSE; domainname=virtual1.com; grace_period=60; lease_lifetime=60; } Export_Defaults { access_type=none; anonymous_gid=-2; anonymous_uid=-2; manage_gids=TRUE; nfs_commit=FALSE; privilegedport=FALSE; protocols=3,4; sectype=sys; squash=root_squash; transports=TCP; } one export # === START /**** id=206 nclients=3 === EXPORT { Attr_Expiration_Time=60; Delegations=none; Export_id=206; Filesystem_id=42.206; MaxOffsetRead=18446744073709551615; MaxOffsetWrite=18446744073709551615; MaxRead=1048576; MaxWrite=1048576; Path="/****"; PrefRead=1048576; PrefReaddir=1048576; PrefWrite=1048576; Pseudo="/****"; Tag="****"; UseCookieVerifier=false; FSAL { Name=GPFS; } CLIENT { # === ****/X12SA === Access_Type=RW; Anonymous_gid=-2; Anonymous_uid=-2; Clients=X.Y.A.B/24; Delegations=none; Manage_Gids=TRUE; NFS_Commit=FALSE; PrivilegedPort=FALSE; Protocols=3; SecType=SYS; Squash=Root; Transports=TCP; } ?. -- Paul Scherrer Institut Heiner Billich System Engineer Scientific Computing Science IT / High Performance Computing WHGA/106 Forschungsstrasse 111 5232 Villigen PSI Switzerland Phone +41 56 310 36 02 heiner.billich at psi.ch https://www.psi.ch _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Andy Kurth Research Storage Specialist NC State University Office of Information Technology P: 919-513-4090 311A Hillsborough Building Campus Box 7109 Raleigh, NC 27695 From mnaineni at in.ibm.com Fri Jan 25 19:38:27 2019 From: mnaineni at in.ibm.com (Malahal R Naineni) Date: Fri, 25 Jan 2019 19:38:27 +0000 Subject: [gpfsug-discuss] does ganesha deny access for unknown UIDs? In-Reply-To: References: , <44051794-8F45-4725-92E0-09729474E7A1@psi.ch> Message-ID: An HTML attachment was scrubbed... URL: From chris.schlipalius at pawsey.org.au Sat Jan 26 01:32:59 2019 From: chris.schlipalius at pawsey.org.au (Chris Schlipalius) Date: Sat, 26 Jan 2019 09:32:59 +0800 Subject: [gpfsug-discuss] Announcing 2019 March 11th Singapore Spectrum Scale User Group event - call for user case speakers Message-ID: Hello, This is the announcement for the Spectrum Scale Usergroup Singapore on Monday 11th March 2019, Suntec Convention and Exhibition Centre, Singapore. This event is being held in conjunction with SCA19 https://sc-asia.org/ All current Singapore Spectrum Scale User Group event details can be found here: http://bit.ly/2FRur9d We are calling for user case speakers please ? let Ulf, Xiang or myself know if you are available to speak at this Usergroup. Feel free to circulate this event link to all who may need it. Please reserve your tickets now as tickets for places will close soon. There are some great speakers and topics, for details please see the agenda on Eventbrite. We are looking forwards to a great Usergroup in a fabulous venue. Thanks again to NSCC and IBM for helping to arrange the venue and event booking. Regards, Chris Schlipalius IBM Champion 2019 Team Lead, Storage Infrastructure, Data & Visualisation, The Pawsey Supercomputing Centre (CSIRO) 1 Bryce Avenue Kensington WA 6151 Australia Tel +61 8 6436 8815 Email chris.schlipalius at pawsey.org.au Web www.pawsey.org.au From Renar.Grunenberg at huk-coburg.de Mon Jan 28 08:36:45 2019 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Mon, 28 Jan 2019 08:36:45 +0000 Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays In-Reply-To: References: Message-ID: <528da43a668745f38d68c0a82ecb53a3@SMXRF105.msg.hukrf.de> Hallo Truong Vu, unfortunality the results are the same, the cmd-responce are not what we want. Ok, we want to analyze something with the trace facility and came to following link in the knowledge center: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/com.ibm.spectrum.scale.v5r02.doc/bl1ins_instracsupp.htm The docu mentioned that we must copy to windows files, tracefmt.exe and tracelog.exe, but the first one are only available in the DDK-Version 7.1 (W2K3), not in the WDK Version 8 or 10. We use W2K12. Can you clarify where I can find the mentioned files. Regards Renar. Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Truong Vu Gesendet: Donnerstag, 24. Januar 2019 19:18 An: gpfsug-discuss at spectrumscale.org Betreff: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Hi Renar, Let's see if it is really the /bin/rm is the problem here. Can you run the command again without cleanup the temp files as follow: DEBUG=1 keepTempFiles=1 mmgetstate -a Thanks, Tru. [Inactive hide details for gpfsug-discuss-request---01/23/2019 07:46:30 AM---Send gpfsug-discuss mailing list submissions to gp]gpfsug-discuss-request---01/23/2019 07:46:30 AM---Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 01/23/2019 07:46 AM Subject: gpfsug-discuss Digest, Vol 84, Issue 32 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: Spectrum Scale Cygwin cmd delays (Grunenberg, Renar) ---------------------------------------------------------------------- Message: 1 Date: Wed, 23 Jan 2019 12:45:39 +0000 From: "Grunenberg, Renar" > To: 'gpfsug main discussion list' > Subject: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Message-ID: <349cb338583a4c1d996677837fc65b6e at SMXRF105.msg.hukrf.de> Content-Type: text/plain; charset="utf-8" Hallo All, as a point to the problem, it seems to be that all the delayes are happening here DEBUG=1 mmgetstate ?a ??.. /bin/rm -f /var/mmfs/gen/mmsdrfs.1256 /var/mmfs/tmp/allClusterNodes.mmgetstate.1256 /var/mmfs/tmp/allQuorumNodes.mmgetstate.1256 /var/mmfs/tmp/allNonQuorumNodes.mmgetstate.1256 /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.pub /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.priv /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.cert /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.keystore /var/mmfs/tmp/nodefile.mmgetstate.1256 /var/mmfs/tmp/diskfile.mmgetstate.1256 /var/mmfs/tmp/diskNamesFile.mmgetstate.1256 Any points to this it will be fixed in the near future are welcome. Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: Grunenberg, Renar Gesendet: Dienstag, 22. Januar 2019 18:10 An: 'gpfsug main discussion list' > Betreff: AW: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Hallo Roger, first thanks fort he tip. But we decided to separate the linux-io-Cluster from the Windows client only cluster, because of security requirements and ssh management requirements. We can use at this point, local named admins on Windows and use on Linux a Deamon and an separated Admin-interface Network for pwless root ssh. Your Hint seems to be CCR related or is this a Cygwin problem. @Spectrum Scale Team: Point1: IP V6 can?t disabled because of applications that want to use this. But the mmcmi cmd are give us already the right ipv4 adresses. Point2. There are no DNS-Issues Point3: We must check these. Any recommendations to Rogers statements? Regards Renar Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Roger Moye Gesendet: Dienstag, 22. Januar 2019 16:43 An: gpfsug main discussion list > Betreff: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays We experienced the same issue and were advised not to use Windows for quorum nodes. We moved our Windows nodes into the storage cluster which was entirely Linux and that solved it. If this is not an option, perhaps adding some Linux nodes to your remote cluster as quorum nodes would help. -Roger From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of IBM Spectrum Scale Sent: Monday, January 21, 2019 5:35 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Hello Renar, A few things to try: 1. Make sure IPv6 is disabled. On each Windows node, run "mmcmi host ", with being itself and each and every node in the cluster. Make sure mmcmi prints valid IPv4 address. 1. To eliminate DNS issues, try adding IPv4 entries for each cluster node in "c:\windows\system32\drivers\etc\hosts". 1. If any anti-virus is active, disable realtime scanning on c:\cygwin64 (wherever you installed cygwin 64-bit). You can also try debugging a script, say: (from GPFS ksh): DEBUG=1 mmlscluster, and see what takes time. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Grunenberg, Renar" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 01/21/2019 08:01 AM Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello All, We test spectrum scale on an windows only Client-Cluster (remote mounted to a linux Cluster) but the execution of mm commands in cygwin is very slow. We have tried the following adjustments to increase the execution speed. * We have installed Cygwin Server as a service (cygserver-config). Unfortunately, this resulted in no faster execution. * Adaptation of the hosts file: 127.0.0.1localhost cygdrive wpad to prevent any DNS problems when accessing ?/cygdrive/...? * Started them as Administrator All adjustments have so far not led to any improvement. Are there any hints to enhance the cmd execution time on windows (w2k12 actual used) Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ----------------------------------------------------------------------------------- The information in this communication and any attachment is confidential and intended solely for the attention and use of the named addressee(s). All information and opinions expressed herein are subject to change without notice. This communication is not to be construed as an offer to sell or the solicitation of an offer to buy any security. Any such offer or solicitation can only be made by means of the delivery of a confidential private offering memorandum (which should be carefully reviewed for a complete description of investment strategies and risks). Any reliance one may place on the accuracy or validity of this information is at their own risk. Past performance is not necessarily indicative of the future results of an investment. All figures are estimated and unaudited unless otherwise noted. If you are not the intended recipient, or a person responsible for delivering this to the intended recipient, you are not authorized to and must not disclose, copy, distribute, o r retain this message or any part of it. In this case, please notify the sender immediately at 713-333-5440 -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 84, Issue 32 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From scale at us.ibm.com Tue Jan 29 00:20:47 2019 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Mon, 28 Jan 2019 16:20:47 -0800 Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays In-Reply-To: <528da43a668745f38d68c0a82ecb53a3@SMXRF105.msg.hukrf.de> References: <528da43a668745f38d68c0a82ecb53a3@SMXRF105.msg.hukrf.de> Message-ID: Hello Renar, I have WDK 8.1 installed and it does come with trace*.exe. Check this out: https://docs.microsoft.com/en-us/windows-hardware/drivers/devtest/tracefmt If not the WDK, did you try your SDK/VisualStudio folders as indicated in the above link? Nevertheless, I have uploaded trace*.exe here for you to download: ftp testcase.software.ibm.com. Login as anonymous and provide your email as password. cd /fromibm/aix. mget trace*.exe. This site gets scrubbed often, hence download soon before they get deleted. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Grunenberg, Renar" To: "gpfsug-discuss at spectrumscale.org" Date: 01/28/2019 12:38 AM Subject: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Sent by: gpfsug-discuss-bounces at spectrumscale.org Hallo Truong Vu, unfortunality the results are the same, the cmd-responce are not what we want. Ok, we want to analyze something with the trace facility and came to following link in the knowledge center: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/com.ibm.spectrum.scale.v5r02.doc/bl1ins_instracsupp.htm The docu mentioned that we must copy to windows files, tracefmt.exe and tracelog.exe, but the first one are only available in the DDK-Version 7.1 (W2K3), not in the WDK Version 8 or 10. We use W2K12. Can you clarify where I can find the mentioned files. Regards Renar. Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. Von: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Truong Vu Gesendet: Donnerstag, 24. Januar 2019 19:18 An: gpfsug-discuss at spectrumscale.org Betreff: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Hi Renar, Let's see if it is really the /bin/rm is the problem here. Can you run the command again without cleanup the temp files as follow: DEBUG=1 keepTempFiles=1 mmgetstate -a Thanks, Tru. gpfsug-discuss-request---01/23/2019 07:46:30 AM---Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 01/23/2019 07:46 AM Subject: gpfsug-discuss Digest, Vol 84, Issue 32 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: Spectrum Scale Cygwin cmd delays (Grunenberg, Renar) ---------------------------------------------------------------------- Message: 1 Date: Wed, 23 Jan 2019 12:45:39 +0000 From: "Grunenberg, Renar" To: 'gpfsug main discussion list' Subject: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Message-ID: <349cb338583a4c1d996677837fc65b6e at SMXRF105.msg.hukrf.de> Content-Type: text/plain; charset="utf-8" Hallo All, as a point to the problem, it seems to be that all the delayes are happening here DEBUG=1 mmgetstate ?a ??.. /bin/rm -f /var/mmfs/gen/mmsdrfs.1256 /var/mmfs/tmp/allClusterNodes.mmgetstate.1256 /var/mmfs/tmp/allQuorumNodes.mmgetstate.1256 /var/mmfs/tmp/allNonQuorumNodes.mmgetstate.1256 /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.pub /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.priv /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.cert /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.keystore /var/mmfs/tmp/nodefile.mmgetstate.1256 /var/mmfs/tmp/diskfile.mmgetstate.1256 /var/mmfs/tmp/diskNamesFile.mmgetstate.1256 Any points to this it will be fixed in the near future are welcome. Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: Grunenberg, Renar Gesendet: Dienstag, 22. Januar 2019 18:10 An: 'gpfsug main discussion list' Betreff: AW: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Hallo Roger, first thanks fort he tip. But we decided to separate the linux-io-Cluster from the Windows client only cluster, because of security requirements and ssh management requirements. We can use at this point, local named admins on Windows and use on Linux a Deamon and an separated Admin-interface Network for pwless root ssh. Your Hint seems to be CCR related or is this a Cygwin problem. @Spectrum Scale Team: Point1: IP V6 can?t disabled because of applications that want to use this. But the mmcmi cmd are give us already the right ipv4 adresses. Point2. There are no DNS-Issues Point3: We must check these. Any recommendations to Rogers statements? Regards Renar Von: gpfsug-discuss-bounces at spectrumscale.org< mailto:gpfsug-discuss-bounces at spectrumscale.org> [ mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Roger Moye Gesendet: Dienstag, 22. Januar 2019 16:43 An: gpfsug main discussion list > Betreff: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays We experienced the same issue and were advised not to use Windows for quorum nodes. We moved our Windows nodes into the storage cluster which was entirely Linux and that solved it. If this is not an option, perhaps adding some Linux nodes to your remote cluster as quorum nodes would help. -Roger From: gpfsug-discuss-bounces at spectrumscale.org< mailto:gpfsug-discuss-bounces at spectrumscale.org> [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of IBM Spectrum Scale Sent: Monday, January 21, 2019 5:35 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Hello Renar, A few things to try: 1. Make sure IPv6 is disabled. On each Windows node, run "mmcmi host ", with being itself and each and every node in the cluster. Make sure mmcmi prints valid IPv4 address. 1. To eliminate DNS issues, try adding IPv4 entries for each cluster node in "c:\windows\system32\drivers\etc\hosts". 1. If any anti-virus is active, disable realtime scanning on c:\cygwin64 (wherever you installed cygwin 64-bit). You can also try debugging a script, say: (from GPFS ksh): DEBUG=1 mmlscluster, and see what takes time. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Grunenberg, Renar" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 01/21/2019 08:01 AM Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Sent by: gpfsug-discuss-bounces at spectrumscale.org< mailto:gpfsug-discuss-bounces at spectrumscale.org> ________________________________ Hello All, We test spectrum scale on an windows only Client-Cluster (remote mounted to a linux Cluster) but the execution of mm commands in cygwin is very slow. We have tried the following adjustments to increase the execution speed. * We have installed Cygwin Server as a service (cygserver-config). Unfortunately, this resulted in no faster execution. * Adaptation of the hosts file: 127.0.0.1localhost cygdrive wpad to prevent any DNS problems when accessing ?/cygdrive/...? * Started them as Administrator All adjustments have so far not led to any improvement. Are there any hints to enhance the cmd execution time on windows (w2k12 actual used) Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ----------------------------------------------------------------------------------- The information in this communication and any attachment is confidential and intended solely for the attention and use of the named addressee(s). All information and opinions expressed herein are subject to change without notice. This communication is not to be construed as an offer to sell or the solicitation of an offer to buy any security. Any such offer or solicitation can only be made by means of the delivery of a confidential private offering memorandum (which should be carefully reviewed for a complete description of investment strategies and risks). Any reliance one may place on the accuracy or validity of this information is at their own risk. Past performance is not necessarily indicative of the future results of an investment. All figures are estimated and unaudited unless otherwise noted. If you are not the intended recipient, or a person responsible for delivering this to the intended recipient, you are not authorized to and must not disclose, copy, distribute, o r retain this message or any part of it. In this case, please notify the sender immediately at 713-333-5440 -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20190123/eff7ad74/attachment.html > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 84, Issue 32 ********************************************** _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=_PEp_I-F3uzCglEj5raDY1xo2-W6myUCIX1ysChh0lo&s=k9JU3wc7KoJj1VWVVSjjAekQcIEfeJazMkT3BBME-SY&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 105 bytes Desc: not available URL: From cblack at nygenome.org Tue Jan 29 17:23:49 2019 From: cblack at nygenome.org (Christopher Black) Date: Tue, 29 Jan 2019 17:23:49 +0000 Subject: [gpfsug-discuss] Querying size of snapshots Message-ID: <51DD7A04-B0EE-4FDE-8FF6-E888A2C51E23@nygenome.org> We have some large filesets (PB+) and filesystems where I would like to monitor delete rates and estimate how much space we will get back as snapshots expire. We only keep 3-4 daily snapshots on this filesystem due to churn. I?ve tried to query the sizes of snapshots using the following command: mmlssnapshot fsname -d --block-size 1T However, this has run for over an hour without producing any results. Metadata is all on flash and I?m not sure why this is taking so long. Does anyone have any insight on this or alternate methods for getting estimates of snapshot sizes? Best, Chris PS I am aware of the warning in docs about the -d option. ________________________________ This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Tue Jan 29 18:24:17 2019 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 29 Jan 2019 15:24:17 -0300 Subject: [gpfsug-discuss] Querying size of snapshots In-Reply-To: <51DD7A04-B0EE-4FDE-8FF6-E888A2C51E23@nygenome.org> References: <51DD7A04-B0EE-4FDE-8FF6-E888A2C51E23@nygenome.org> Message-ID: 1. First off, let's RTFM ... -d Displays the amount of storage that is used by the snapshot. This operation requires an amount of time that is proportional to the size of the file system; therefore, it can take several minutes or even hours on a large and heavily-loaded file system. This optional parameter can impact overall system performance. Avoid running the mmlssnapshot command with this parameter frequently or during periods of high file system activity. SOOOO.. there's that. 2. Next you may ask, HOW is that? Snapshots are maintained with a "COW" strategy -- They are created quickly, essentially just making a record that the snapshot was created and at such and such time -- when the snapshot is the same as the "live" filesystem... Then over time, each change to a block of data in live system requires that a copy is made of the old data block and that is associated with the most recently created snapshot.... SO, as more and more changes are made to different blocks over time the snapshot becomes bigger and bigger. How big? Well it seems the current implementation does not keep a "simple counter" of the number of blocks -- but rather, a list of the blocks that were COW'ed.... So when you come and ask "How big"... GPFS has to go traverse the file sytem metadata and count those COW'ed blocks.... 3. So why not keep a counter? Well, it's likely not so simple. For starters GPFS is typically running concurrently on several or many nodes... And probably was not deemed worth the effort ..... IF a convincing case could be made, I'd bet there is a way... to at least keep approximate numbers, log records, exact updates periodically, etc, etc -- similar to the way space allocation and accounting is done for the live file system... -------------- next part -------------- An HTML attachment was scrubbed... URL: From cblack at nygenome.org Tue Jan 29 18:43:24 2019 From: cblack at nygenome.org (Christopher Black) Date: Tue, 29 Jan 2019 18:43:24 +0000 Subject: [gpfsug-discuss] Querying size of snapshots In-Reply-To: References: <51DD7A04-B0EE-4FDE-8FF6-E888A2C51E23@nygenome.org> Message-ID: <369E2850-0335-468D-9F86-B18D4576A04C@nygenome.org> Thanks for the quick and detailed reply! I had read the manual and was aware of the warnings about -d (mentioned in my PS). On systems with high churn (lots of temporary files, lots of big and small deletes along with many new files), I?ve previously used estimates of snapshot size as a useful signal on whether we can expect to see an increase in available space over the next few days as snapshots expire. I?ve used this technique on a few different more mainstream storage systems, but never on gpfs. I?d find it useful to have a similar way to monitor ?space to be freed pending snapshot deletes? on gpfs. It sounds like there is not an existing solution for this so it would be a request for enhancement. I?m not sure how much overhead there would be keeping a running counter for blocks changed since snapshot creation or if that would completely fall apart on large systems or systems with many snapshots. If that is a consideration even having only an estimate for the oldest snapshot would be useful, but I realize that can depend on all the other later snapshots as well. Perhaps an overall ?size of all snapshots? would be easier to manage and would still be useful to us. I don?t need this number to be 100% accurate, but a low or floor estimate would be very useful. Is anyone else interested in this? Do other people have other ways to estimate how much space they will get back as snapshots expire? Is there a more efficient way of making such an estimate available to admins other than running an mmlssnapshot -d every night and recording the output? Thanks all! Chris From: on behalf of Marc A Kaplan Reply-To: gpfsug main discussion list Date: Tuesday, January 29, 2019 at 1:24 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Querying size of snapshots 1. First off, let's RTFM ... -d Displays the amount of storage that is used by the snapshot. This operation requires an amount of time that is proportional to the size of the file system; therefore, it can take several minutes or even hours on a large and heavily-loaded file system. This optional parameter can impact overall system performance. Avoid running the mmlssnapshot command with this parameter frequently or during periods of high file system activity. SOOOO.. there's that. 2. Next you may ask, HOW is that? Snapshots are maintained with a "COW" strategy -- They are created quickly, essentially just making a record that the snapshot was created and at such and such time -- when the snapshot is the same as the "live" filesystem... Then over time, each change to a block of data in live system requires that a copy is made of the old data block and that is associated with the most recently created snapshot.... SO, as more and more changes are made to different blocks over time the snapshot becomes bigger and bigger. How big? Well it seems the current implementation does not keep a "simple counter" of the number of blocks -- but rather, a list of the blocks that were COW'ed.... So when you come and ask "How big"... GPFS has to go traverse the file sytem metadata and count those COW'ed blocks.... 3. So why not keep a counter? Well, it's likely not so simple. For starters GPFS is typically running concurrently on several or many nodes... And probably was not deemed worth the effort ..... IF a convincing case could be made, I'd bet there is a way... to at least keep approximate numbers, log records, exact updates periodically, etc, etc -- similar to the way space allocation and accounting is done for the live file system... ________________________________ This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Tue Jan 29 19:19:12 2019 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Tue, 29 Jan 2019 20:19:12 +0100 Subject: [gpfsug-discuss] Querying size of snapshots In-Reply-To: <369E2850-0335-468D-9F86-B18D4576A04C@nygenome.org> References: <51DD7A04-B0EE-4FDE-8FF6-E888A2C51E23@nygenome.org> <369E2850-0335-468D-9F86-B18D4576A04C@nygenome.org> Message-ID: You could put snapshot data in a separate storage pool. Then it should be visible how much space it occupies, but it?s a bit hard to see how this will be usable/manageable.. -jf tir. 29. jan. 2019 kl. 20:08 skrev Christopher Black : > Thanks for the quick and detailed reply! I had read the manual and was > aware of the warnings about -d (mentioned in my PS). > > On systems with high churn (lots of temporary files, lots of big and small > deletes along with many new files), I?ve previously used estimates of > snapshot size as a useful signal on whether we can expect to see an > increase in available space over the next few days as snapshots expire. > I?ve used this technique on a few different more mainstream storage > systems, but never on gpfs. > > I?d find it useful to have a similar way to monitor ?space to be freed > pending snapshot deletes? on gpfs. It sounds like there is not an existing > solution for this so it would be a request for enhancement. > > I?m not sure how much overhead there would be keeping a running counter > for blocks changed since snapshot creation or if that would completely fall > apart on large systems or systems with many snapshots. If that is a > consideration even having only an estimate for the oldest snapshot would be > useful, but I realize that can depend on all the other later snapshots as > well. Perhaps an overall ?size of all snapshots? would be easier to manage > and would still be useful to us. > > I don?t need this number to be 100% accurate, but a low or floor estimate > would be very useful. > > > > Is anyone else interested in this? Do other people have other ways to > estimate how much space they will get back as snapshots expire? Is there a > more efficient way of making such an estimate available to admins other > than running an mmlssnapshot -d every night and recording the output? > > > > Thanks all! > > Chris > > > > *From: * on behalf of Marc A > Kaplan > *Reply-To: *gpfsug main discussion list > *Date: *Tuesday, January 29, 2019 at 1:24 PM > *To: *gpfsug main discussion list > *Subject: *Re: [gpfsug-discuss] Querying size of snapshots > > > > 1. First off, let's RTFM ... > > *-d *Displays the amount of storage that is used by the snapshot. > This operation requires an amount of time that is proportional to the size > of the file system; therefore, > it can take several minutes or even hours on a large and heavily-loaded > file system. > This optional parameter can impact overall system performance. Avoid > running the * mmlssnapshot* > command with this parameter frequently or during periods of high file > system activity. > > SOOOO.. there's that. > > 2. Next you may ask, HOW is that? > > Snapshots are maintained with a "COW" strategy -- They are created > quickly, essentially just making a record that the snapshot was created and > at such and such time -- when the snapshot is the same as the "live" > filesystem... > > Then over time, each change to a block of data in live system requires > that a copy is made of the old data block and that is associated with the > most recently created snapshot.... SO, as more and more changes are made > to different blocks over time the snapshot becomes bigger and bigger. How > big? Well it seems the current implementation does not keep a "simple > counter" of the number of blocks -- but rather, a list of the blocks that > were COW'ed.... So when you come and ask "How big"... GPFS has to go > traverse the file sytem metadata and count those COW'ed blocks.... > > 3. So why not keep a counter? Well, it's likely not so simple. For > starters GPFS is typically running concurrently on several or many > nodes... And probably was not deemed worth the effort ..... IF a > convincing case could be made, I'd bet there is a way... to at least keep > approximate numbers, log records, exact updates periodically, etc, etc -- > similar to the way space allocation and accounting is done for the live > file system... > > > ------------------------------ > This message is for the recipient?s use only, and may contain > confidential, privileged or protected information. Any unauthorized use or > dissemination of this communication is prohibited. If you received this > message in error, please immediately notify the sender and destroy all > copies of this message. The recipient should check this email and any > attachments for the presence of viruses, as we accept no liability for any > damage caused by any virus transmitted by this email. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Tue Jan 29 21:37:08 2019 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Tue, 29 Jan 2019 22:37:08 +0100 Subject: [gpfsug-discuss] Querying size of snapshots In-Reply-To: References: <51DD7A04-B0EE-4FDE-8FF6-E888A2C51E23@nygenome.org><369E2850-0335-468D-9F86-B18D4576A04C@nygenome.org> Message-ID: An HTML attachment was scrubbed... URL: From alvise.dorigo at psi.ch Wed Jan 30 13:16:22 2019 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Wed, 30 Jan 2019 13:16:22 +0000 Subject: [gpfsug-discuss] Unbalanced pdisk free space Message-ID: <83A6EEB0EC738F459A39439733AE8045267DF159@MBX114.d.ethz.ch> Hello, I've a Lenovo Spectrum Scale system DSS-G220 (software dss-g-2.0a) composed of 2x x3560 M5 IO server nodes 1x x3550 M5 client/support node 2x disk enclosures D3284 GPFS/GNR 4.2.3-7 Can anybody tell me if it is normal that all the pdisks of both my recovery groups, residing on the same physical enclosure have free space equal to (more or less) 1/3 of the free space of the pdisks residing on the other physical enclosure (see attached text files for the command line output) ? I guess when the least free disks are fully occupied (while the others are still partially free) write performance will drop by a factor of two. Correct ? Is there a way (considering that the system is in production) to fix (rebalance) this free space among all pdisk of both enclosures ? Should I open a PMR to IBM ? Many thanks, Alvise -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: rg1 Type: application/octet-stream Size: 13340 bytes Desc: rg1 URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: rg2 Type: application/octet-stream Size: 13340 bytes Desc: rg2 URL: From abeattie at au1.ibm.com Wed Jan 30 14:53:47 2019 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Wed, 30 Jan 2019 14:53:47 +0000 Subject: [gpfsug-discuss] Unbalanced pdisk free space In-Reply-To: <83A6EEB0EC738F459A39439733AE8045267DF159@MBX114.d.ethz.ch> References: <83A6EEB0EC738F459A39439733AE8045267DF159@MBX114.d.ethz.ch> Message-ID: An HTML attachment was scrubbed... URL: From scale at us.ibm.com Wed Jan 30 20:25:20 2019 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Wed, 30 Jan 2019 15:25:20 -0500 Subject: [gpfsug-discuss] Unbalanced pdisk free space In-Reply-To: <83A6EEB0EC738F459A39439733AE8045267DF159@MBX114.d.ethz.ch> References: <83A6EEB0EC738F459A39439733AE8045267DF159@MBX114.d.ethz.ch> Message-ID: Alvise, Could you send us the output of the following commands from both server nodes. mmfsadm dump nspdclient > /tmp/dump_nspdclient. mmfsadm dump pdisk > /tmp/dump_pdisk. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Dorigo Alvise (PSI)" To: "gpfsug-discuss at spectrumscale.org" Date: 01/30/2019 08:24 AM Subject: [gpfsug-discuss] Unbalanced pdisk free space Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello, I've a Lenovo Spectrum Scale system DSS-G220 (software dss-g-2.0a) composed of 2x x3560 M5 IO server nodes 1x x3550 M5 client/support node 2x disk enclosures D3284 GPFS/GNR 4.2.3-7 Can anybody tell me if it is normal that all the pdisks of both my recovery groups, residing on the same physical enclosure have free space equal to (more or less) 1/3 of the free space of the pdisks residing on the other physical enclosure (see attached text files for the command line output) ? I guess when the least free disks are fully occupied (while the others are still partially free) write performance will drop by a factor of two. Correct ? Is there a way (considering that the system is in production) to fix (rebalance) this free space among all pdisk of both enclosures ? Should I open a PMR to IBM ? Many thanks, Alvise [attachment "rg1" deleted by Brian Herr/Poughkeepsie/IBM] [attachment "rg2" deleted by Brian Herr/Poughkeepsie/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=QDZ-afehEgpYi3JGRd8q6rHgo4rb8gVu_VKQwg4MwEs&s=5bEFHRU7zk-nRK_d20vJBngQOOkSLWT1vvtcDNKD584&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Jan 30 20:51:49 2019 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 30 Jan 2019 20:51:49 +0000 Subject: [gpfsug-discuss] =?utf-8?q?Node_=E2=80=98crash_and_restart?= =?utf-8?q?=E2=80=99_event_using_GPFS_callback=3F?= Message-ID: <7F7F8D51-256D-4EA7-B03F-11CF7B752AFA@nuance.com> Anyone crafted a good way to detect a node ?crash and restart? event using GPFS callbacks? I?m thinking ?preShutdown? but I?m not sure if that?s the best. What I?m really looking for is did the node shutdown (abort) and create a dump in /tmp/mmfs Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From Paul.Sanchez at deshaw.com Wed Jan 30 21:02:26 2019 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Wed, 30 Jan 2019 21:02:26 +0000 Subject: [gpfsug-discuss] =?utf-8?q?Node_=E2=80=98crash_and_restart?= =?utf-8?q?=E2=80=99_event_using_GPFS_callback=3F?= In-Reply-To: <7F7F8D51-256D-4EA7-B03F-11CF7B752AFA@nuance.com> References: <7F7F8D51-256D-4EA7-B03F-11CF7B752AFA@nuance.com> Message-ID: <9892d4f382cb4fbe80dbde9f87724632@mbxtoa1.winmail.deshaw.com> There are some cases which I don?t believe can be caught with callbacks (e.g. DMS = Dead Man Switch). But you could possibly use preStartup to check the host uptime to make an assumption if GPFS was restarted long after the host booted. You could also peek in /tmp/mmfs and only report if you find something there. That said, the docs say that preStartup fires after the node joins the cluster. So if that means once the node is ?active? then you might miss out on nodes stuck in ?arbitrating? for a while due to a waiter problem. We run a script with cron which monitors the myriad things which can go wrong and attempt to right those which are safe to fix, and raise alerts appropriately. Something like that, outside the reach of GPFS, is often a good choice if you don?t need to know something the moment it happens. Thx Paul From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Oesterlin, Robert Sent: Wednesday, January 30, 2019 3:52 PM To: gpfsug main discussion list Subject: [gpfsug-discuss] Node ?crash and restart? event using GPFS callback? Anyone crafted a good way to detect a node ?crash and restart? event using GPFS callbacks? I?m thinking ?preShutdown? but I?m not sure if that?s the best. What I?m really looking for is did the node shutdown (abort) and create a dump in /tmp/mmfs Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Wed Jan 30 21:16:51 2019 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Wed, 30 Jan 2019 18:16:51 -0300 Subject: [gpfsug-discuss] =?utf-8?q?Node_=E2=80=98crash_and_restart?= =?utf-8?q?=E2=80=99_event_using_GPFS_callback=3F?= In-Reply-To: <7F7F8D51-256D-4EA7-B03F-11CF7B752AFA@nuance.com> References: <7F7F8D51-256D-4EA7-B03F-11CF7B752AFA@nuance.com> Message-ID: We have (pre)shutdown and pre(startup) ... Trap and record both... If you see a startup without a matching shutdown you know the shutdown never happened, because GPFS crashed. From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 01/30/2019 05:52 PM Subject: [gpfsug-discuss] Node ?crash and restart? event using GPFS callback? Sent by: gpfsug-discuss-bounces at spectrumscale.org Anyone crafted a good way to detect a node ?crash and restart? event using GPFS callbacks? I?m thinking ?preShutdown? but I?m not sure if that?s the best. What I?m really looking for is did the node shutdown (abort) and create a dump in /tmp/mmfs Bob Oesterlin Sr Principal Storage Engineer, Nuance _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8&m=oBQHDWo5PVKthJjmbVrQyqSrkuFZEcMQb_tXtvcKepE&s=HfF_wArTvc-i4wLfATXbwrImRT-w0mKG8mhctBJFLCI&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From Dwayne.Hart at med.mun.ca Wed Jan 30 21:52:48 2019 From: Dwayne.Hart at med.mun.ca (Dwayne.Hart at med.mun.ca) Date: Wed, 30 Jan 2019 21:52:48 +0000 Subject: [gpfsug-discuss] =?windows-1252?q?Node_=91crash_and_restart=92_ev?= =?windows-1252?q?ent_using_GPFS_callback=3F?= In-Reply-To: <9892d4f382cb4fbe80dbde9f87724632@mbxtoa1.winmail.deshaw.com> References: <7F7F8D51-256D-4EA7-B03F-11CF7B752AFA@nuance.com>, <9892d4f382cb4fbe80dbde9f87724632@mbxtoa1.winmail.deshaw.com> Message-ID: <063B3F21-8695-4454-8D1A-B1734B1AD436@med.mun.ca> Could you get away with running ?mmdiag ?stats? and inspecting the uptime information it provides? Best, Dwayne ? Dwayne Hart | Systems Administrator IV CHIA, Faculty of Medicine Memorial University of Newfoundland 300 Prince Philip Drive St. John?s, Newfoundland | A1B 3V6 Craig L Dobbin Building | 4M409 T 709 864 6631 On Jan 30, 2019, at 5:32 PM, Sanchez, Paul > wrote: There are some cases which I don?t believe can be caught with callbacks (e.g. DMS = Dead Man Switch). But you could possibly use preStartup to check the host uptime to make an assumption if GPFS was restarted long after the host booted. You could also peek in /tmp/mmfs and only report if you find something there. That said, the docs say that preStartup fires after the node joins the cluster. So if that means once the node is ?active? then you might miss out on nodes stuck in ?arbitrating? for a while due to a waiter problem. We run a script with cron which monitors the myriad things which can go wrong and attempt to right those which are safe to fix, and raise alerts appropriately. Something like that, outside the reach of GPFS, is often a good choice if you don?t need to know something the moment it happens. Thx Paul From: gpfsug-discuss-bounces at spectrumscale.org > On Behalf Of Oesterlin, Robert Sent: Wednesday, January 30, 2019 3:52 PM To: gpfsug main discussion list > Subject: [gpfsug-discuss] Node ?crash and restart? event using GPFS callback? Anyone crafted a good way to detect a node ?crash and restart? event using GPFS callbacks? I?m thinking ?preShutdown? but I?m not sure if that?s the best. What I?m really looking for is did the node shutdown (abort) and create a dump in /tmp/mmfs Bob Oesterlin Sr Principal Storage Engineer, Nuance _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu Jan 31 01:19:47 2019 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 31 Jan 2019 01:19:47 +0000 Subject: [gpfsug-discuss] =?utf-8?q?Node_=E2=80=98crash_and_restart?= =?utf-8?q?=E2=80=99_event_using_GPFS_callback=3F?= Message-ID: <554E186D-30BD-4E7D-859C-339F5DDAD442@nuance.com> Actually, I think ?preShutdown? will do it since it passes the type of shutdown ?abnormal? for a crash to the call back - I can use that to send a Slack message. mmaddcallback node-abort --event preShutdown --command /usr/local/sbin/callback-test.sh --parms "%eventName %reason" and you get either: preShutdown normal preShutdown abnormal Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Marc A Kaplan Reply-To: gpfsug main discussion list Date: Wednesday, January 30, 2019 at 3:17 PM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Node ?crash and restart? event using GPFS callback? We have (pre)shutdown and pre(startup) ... Trap and record both... If you see a startup without a matching shutdown you know the shutdown never happened, because GPFS crashed. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alastair.smith at ucl.ac.uk Wed Jan 30 14:11:08 2019 From: alastair.smith at ucl.ac.uk (Smith, Alastair) Date: Wed, 30 Jan 2019 14:11:08 +0000 Subject: [gpfsug-discuss] Job opportunity at UCL Research Data Services Message-ID: Dear List Members, We would like to draw you attention to a job opportunity at UCL for a Senior Research Data Systems Engineer. The is a technical role in the Research Data Services Group, part of UCL's large and well-established Research IT Services team. The Senior Data Systems Engineer leads the development of technical strategy for Research Data Services at UCL. The successful applicant will ensure that appropriate technologies and workflows are used to address research data management requirements across the institution, particularly those relating to data storage and access. The Research Data Services Group provides petabyte-scale data storage for active research projects, and is about to launch a long-term data repository service. Over the coming years, the Group will be building an integrated suite of services to support data management from planning to re-use, and the successful candidate will play an important role in the design and operation of these services. The post comes with a competitive salary and a central London working location. The closing date for applications it 2nd February. Further particulars and a link to the application form are available from https://tinyurl.com/ucljobs-rdse. -|-|-|-|-|-|-|-|-|-|-|-|-|- Dr Alastair Smith Senior research data systems engineer Research Data Services, RITS Information Services Division University College London 1 St Martin's- Le-Grand London EC1A 4AS -------------- next part -------------- An HTML attachment was scrubbed... URL: From alvise.dorigo at psi.ch Thu Jan 31 09:48:12 2019 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Thu, 31 Jan 2019 09:48:12 +0000 Subject: [gpfsug-discuss] Unbalanced pdisk free space In-Reply-To: References: <83A6EEB0EC738F459A39439733AE8045267DF159@MBX114.d.ethz.ch>, Message-ID: <83A6EEB0EC738F459A39439733AE8045267E32C0@MBX114.d.ethz.ch> They're attached. Thanks! Alvise ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of IBM Spectrum Scale [scale at us.ibm.com] Sent: Wednesday, January 30, 2019 9:25 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Unbalanced pdisk free space Alvise, Could you send us the output of the following commands from both server nodes. * mmfsadm dump nspdclient > /tmp/dump_nspdclient. * mmfsadm dump pdisk > /tmp/dump_pdisk. * Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Dorigo Alvise (PSI)" To: "gpfsug-discuss at spectrumscale.org" Date: 01/30/2019 08:24 AM Subject: [gpfsug-discuss] Unbalanced pdisk free space Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello, I've a Lenovo Spectrum Scale system DSS-G220 (software dss-g-2.0a) composed of 2x x3560 M5 IO server nodes 1x x3550 M5 client/support node 2x disk enclosures D3284 GPFS/GNR 4.2.3-7 Can anybody tell me if it is normal that all the pdisks of both my recovery groups, residing on the same physical enclosure have free space equal to (more or less) 1/3 of the free space of the pdisks residing on the other physical enclosure (see attached text files for the command line output) ? I guess when the least free disks are fully occupied (while the others are still partially free) write performance will drop by a factor of two. Correct ? Is there a way (considering that the system is in production) to fix (rebalance) this free space among all pdisk of both enclosures ? Should I open a PMR to IBM ? Many thanks, Alvise [attachment "rg1" deleted by Brian Herr/Poughkeepsie/IBM] [attachment "rg2" deleted by Brian Herr/Poughkeepsie/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: dump_nspdclient.sf-dssio-1 Type: application/octet-stream Size: 570473 bytes Desc: dump_nspdclient.sf-dssio-1 URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: dump_nspdclient.sf-dssio-2 Type: application/octet-stream Size: 566924 bytes Desc: dump_nspdclient.sf-dssio-2 URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: dump_pdisk.sf-dssio-1 Type: application/octet-stream Size: 682312 bytes Desc: dump_pdisk.sf-dssio-1 URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: dump_pdisk.sf-dssio-2 Type: application/octet-stream Size: 619497 bytes Desc: dump_pdisk.sf-dssio-2 URL: From heiner.billich at psi.ch Thu Jan 31 14:56:21 2019 From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI)) Date: Thu, 31 Jan 2019 14:56:21 +0000 Subject: [gpfsug-discuss] Token manager - how to monitor performance? Message-ID: <02FE0AE6-BDDC-4E10-9C41-E68EB91758AA@psi.ch> Hello, Sorry for coming up with this never-ending story. I know that token management is mainly autoconfigured and even the placement of token manager nodes is no longer under user control in all cases. Still I would like to monitor this component to see if we are close to some limit like memory or rpc rate. Especially as we?ll do some major changes to our setup soon. I would like to monitor the performance of our token manager nodes to get warned _before_ we get performance issues. Any advice is welcome. Ideally I would like collect some numbers and pass them on to influxdb or similar. I didn?t find anything in perfmon/zimon that seemed to match. I could imagine that numbers like ?number of active tokens? and ?number of token operations? per manager would be helpful. Or ?# of rpc calls per second?. And maybe ?number of open files?, ?number of token operations?, ?number of tokens? for clients. And maybe some percentage of used token memory ? and cache hit ratio ? This would also help to tune ? like if a client does very many token operations or rpc calls maybe I should increase maxFilesToCache. The above is just to illustrate, as token management is complicated the really valuable metrics may be different. Or am I too anxious and should wait and see instead? cheers, Heiner Heiner Billich -- Paul Scherrer Institut Heiner Billich System Engineer Scientific Computing Science IT / High Performance Computing WHGA/106 Forschungsstrasse 111 5232 Villigen PSI Switzerland Phone +41 56 310 36 02 heiner.billich at psi.ch https://www.psi.ch -------------- next part -------------- An HTML attachment was scrubbed... URL: From TOMP at il.ibm.com Thu Jan 31 15:11:24 2019 From: TOMP at il.ibm.com (Tomer Perry) Date: Thu, 31 Jan 2019 17:11:24 +0200 Subject: [gpfsug-discuss] Token manager - how to monitor performance? In-Reply-To: <02FE0AE6-BDDC-4E10-9C41-E68EB91758AA@psi.ch> References: <02FE0AE6-BDDC-4E10-9C41-E68EB91758AA@psi.ch> Message-ID: Hi, I agree that we should potentially add mode metrics, but for a start, I would look into mmdiag --memory and mmdiag --tokenmgr (the latter show different output on a token server). Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: "Billich Heinrich Rainer (PSI)" To: gpfsug main discussion list Date: 31/01/2019 16:56 Subject: [gpfsug-discuss] Token manager - how to monitor performance? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello, Sorry for coming up with this never-ending story. I know that token management is mainly autoconfigured and even the placement of token manager nodes is no longer under user control in all cases. Still I would like to monitor this component to see if we are close to some limit like memory or rpc rate. Especially as we?ll do some major changes to our setup soon. I would like to monitor the performance of our token manager nodes to get warned _before_ we get performance issues. Any advice is welcome. Ideally I would like collect some numbers and pass them on to influxdb or similar. I didn?t find anything in perfmon/zimon that seemed to match. I could imagine that numbers like ?number of active tokens? and ?number of token operations? per manager would be helpful. Or ?# of rpc calls per second?. And maybe ?number of open files?, ?number of token operations?, ?number of tokens? for clients. And maybe some percentage of used token memory ? and cache hit ratio ? This would also help to tune ? like if a client does very many token operations or rpc calls maybe I should increase maxFilesToCache. The above is just to illustrate, as token management is complicated the really valuable metrics may be different. Or am I too anxious and should wait and see instead? cheers, Heiner Heiner Billich -- Paul Scherrer Institut Heiner Billich System Engineer Scientific Computing Science IT / High Performance Computing WHGA/106 Forschungsstrasse 111 5232 Villigen PSI Switzerland Phone +41 56 310 36 02 heiner.billich at psi.ch https://www.psi.ch _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=mLPyKeOa1gNDrORvEXBgMw&m=J5n3Wsk1f6CsyL867jkmS3P2BYZDfkPS6GB9dShnYcI&s=YFTWUM3MQu8C1MitRnyPnYQ_wMtjj3Uwmif6gJUoLgc&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Wed Jan 30 21:15:48 2019 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 30 Jan 2019 21:15:48 +0000 Subject: [gpfsug-discuss] =?utf-8?q?Node_=E2=80=98crash_and_restart?= =?utf-8?q?=E2=80=99_event_using_GPFS_callback=3F?= In-Reply-To: <9892d4f382cb4fbe80dbde9f87724632@mbxtoa1.winmail.deshaw.com> References: <7F7F8D51-256D-4EA7-B03F-11CF7B752AFA@nuance.com> <9892d4f382cb4fbe80dbde9f87724632@mbxtoa1.winmail.deshaw.com> Message-ID: Hi Bob, We use the nodeLeave callback to detect node expels ? for what you?re wanting to do I wonder if nodeJoin might work?? If a node joins the cluster and then has an uptime of a few minutes you could go looking in /tmp/mmfs. HTH... -- Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Jan 30, 2019, at 3:02 PM, Sanchez, Paul > wrote: There are some cases which I don?t believe can be caught with callbacks (e.g. DMS = Dead Man Switch). But you could possibly use preStartup to check the host uptime to make an assumption if GPFS was restarted long after the host booted. You could also peek in /tmp/mmfs and only report if you find something there. That said, the docs say that preStartup fires after the node joins the cluster. So if that means once the node is ?active? then you might miss out on nodes stuck in ?arbitrating? for a while due to a waiter problem. We run a script with cron which monitors the myriad things which can go wrong and attempt to right those which are safe to fix, and raise alerts appropriately. Something like that, outside the reach of GPFS, is often a good choice if you don?t need to know something the moment it happens. Thx Paul From: gpfsug-discuss-bounces at spectrumscale.org > On Behalf Of Oesterlin, Robert Sent: Wednesday, January 30, 2019 3:52 PM To: gpfsug main discussion list > Subject: [gpfsug-discuss] Node ?crash and restart? event using GPFS callback? Anyone crafted a good way to detect a node ?crash and restart? event using GPFS callbacks? I?m thinking ?preShutdown? but I?m not sure if that?s the best. What I?m really looking for is did the node shutdown (abort) and create a dump in /tmp/mmfs Bob Oesterlin Sr Principal Storage Engineer, Nuance _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cccd012a939124326a53908d686f64117%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636844789557921185&sdata=9bMPd%2F%2B%2Babt6IdeFYcdznPBQwPrMLFsXHTBYISlyYGM%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Jan 31 15:40:50 2019 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 31 Jan 2019 12:40:50 -0300 Subject: [gpfsug-discuss] =?utf-8?q?Node_=E2=80=98crash_and_restart?= =?utf-8?q?=E2=80=99_event_using_GPFS_callback=3F?= In-Reply-To: References: <7F7F8D51-256D-4EA7-B03F-11CF7B752AFA@nuance.com><9892d4f382cb4fbe80dbde9f87724632@mbxtoa1.winmail.deshaw.com> Message-ID: Various "leave" / join events may be interesting ... But you've got to consider that an abrupt failure of several nodes is not necessarily recorded anywhere! For example, because the would be recording devices might all lose power at the same time. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu Jan 31 15:46:38 2019 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 31 Jan 2019 15:46:38 +0000 Subject: [gpfsug-discuss] =?utf-8?q?Node_=E2=80=98crash_and_restart?= =?utf-8?q?=E2=80=99_event_using_GPFS_callback=3F?= In-Reply-To: References: <7F7F8D51-256D-4EA7-B03F-11CF7B752AFA@nuance.com> <9892d4f382cb4fbe80dbde9f87724632@mbxtoa1.winmail.deshaw.com> Message-ID: <572FF01C-A82D-45FD-AB34-A897BFE59325@nuance.com> A better way to detect node expels is to install the expelnode into /var/mmfs/etc/ (sample in /usr/lpp/mmfs/samples/expelnode.sample) - put this on your manager nodes. It runs on every expel and you can customize it pretty easily. We generate a Slack message to a specific channel: GPFS Node Expel nrg1 APP [1:56 AM] nrg1-gpfs01 Expelling node gnj-r05r05u30, other node cnt-r04r08u40 Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of "Buterbaugh, Kevin L" Reply-To: gpfsug main discussion list Date: Thursday, January 31, 2019 at 9:19 AM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Node ?crash and restart? event using GPFS callback? Hi Bob, We use the nodeLeave callback to detect node expels ? for what you?re wanting to do I wonder if nodeJoin might work?? If a node joins the cluster and then has an uptime of a few minutes you could go looking in /tmp/mmfs. HTH... -- Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Jan 30, 2019, at 3:02 PM, Sanchez, Paul > wrote: There are some cases which I don?t believe can be caught with callbacks (e.g. DMS = Dead Man Switch). But you could possibly use preStartup to check the host uptime to make an assumption if GPFS was restarted long after the host booted. You could also peek in /tmp/mmfs and only report if you find something there. That said, the docs say that preStartup fires after the node joins the cluster. So if that means once the node is ?active? then you might miss out on nodes stuck in ?arbitrating? for a while due to a waiter problem. We run a script with cron which monitors the myriad things which can go wrong and attempt to right those which are safe to fix, and raise alerts appropriately. Something like that, outside the reach of GPFS, is often a good choice if you don?t need to know something the moment it happens. Thx Paul From: gpfsug-discuss-bounces at spectrumscale.org > On Behalf Of Oesterlin, Robert Sent: Wednesday, January 30, 2019 3:52 PM To: gpfsug main discussion list > Subject: [gpfsug-discuss] Node ?crash and restart? event using GPFS callback? Anyone crafted a good way to detect a node ?crash and restart? event using GPFS callbacks? I?m thinking ?preShutdown? but I?m not sure if that?s the best. What I?m really looking for is did the node shutdown (abort) and create a dump in /tmp/mmfs Bob Oesterlin Sr Principal Storage Engineer, Nuance _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cccd012a939124326a53908d686f64117%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636844789557921185&sdata=9bMPd%2F%2B%2Babt6IdeFYcdznPBQwPrMLFsXHTBYISlyYGM%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From chair at spectrumscale.org Thu Jan 31 20:44:25 2019 From: chair at spectrumscale.org (Simon Thompson (Spectrum Scale User Group Chair)) Date: Thu, 31 Jan 2019 20:44:25 +0000 Subject: [gpfsug-discuss] Call for input & save the date Message-ID: <213C4D17-C0D2-4883-834F-7E2E00B4EE3F@spectrumscale.org> Hi All, We?ve just published the main dates for 2019 Spectrum Scale meetings on the user group website at: https://www.spectrumscaleug.org/ Please take a look over the list of events and pencil them in your diary! (some of those later in the year are tentative and there are a couple more that might get added in some other territories). Myself, Kristy, Bob, Chris and Ulf are currently having some discussion on the topics we?d like to have covered in the various user group meetings. If you have any specific topics you?d like to hear about, then please let me know in the next few days? we can?t promise we can get a speaker, but if you don?t let us know we can?t try! As usual, we?ll be looking for user speakers for all of our events. The user group events only work well if we have people talking about their uses of Spectrum Scale, so please think about offering a talk and let us know! Thanks Simon UK Group Chair -------------- next part -------------- An HTML attachment was scrubbed... URL: From andreas.mattsson at maxiv.lu.se Fri Jan 4 09:09:03 2019 From: andreas.mattsson at maxiv.lu.se (Andreas Mattsson) Date: Fri, 4 Jan 2019 09:09:03 +0000 Subject: [gpfsug-discuss] Filesystem access issues via CES NFS In-Reply-To: References: <717f49aade0b439eb1b99fc620a21cac@maxiv.lu.se> <9456645b0a1f4b488b13874ea672b9b8@maxiv.lu.se>, Message-ID: Just reporting back that the issue we had seems to have been solved. In our case it was fixed by applying hotfix-packages from IBM. Did this in December and I can no longer trigger the issue. Hopefully, it'll stay fixed when we get full production load on the system again now in January. Also, as far as I can see, it looks like Scale 5.0.2.2 includes these packages already. Regards, Andreas mattsson ____________________________________________ [X] Andreas Mattsson Systems Engineer MAX IV Laboratory Lund University P.O. Box 118, SE-221 00 Lund, Sweden Visiting address: Fotongatan 2, 224 84 Lund Mobile: +46 706 64 95 44 andreas.mattsson at maxiv.lu.se www.maxiv.se ________________________________ Fr?n: gpfsug-discuss-bounces at spectrumscale.org f?r Ulrich Sibiller Skickat: den 13 december 2018 14:52:42 Till: gpfsug-discuss at spectrumscale.org ?mne: Re: [gpfsug-discuss] Filesystem access issues via CES NFS On 23.11.2018 14:41, Andreas Mattsson wrote: > Yes, this is repeating. > > We?ve ascertained that it has nothing to do at all with file operations on the GPFS side. > > Randomly throughout the filesystem mounted via NFS, ls or file access will give > > ? > > > ls: reading directory /gpfs/filessystem/test/testdir: Invalid argument > > ? > > Trying again later might work on that folder, but might fail somewhere else. > > We have tried exporting the same filesystem via a standard kernel NFS instead of the CES > Ganesha-NFS, and then the problem doesn?t exist. > > So it is definitely related to the Ganesha NFS server, or its interaction with the file system. > > Will see if I can get a tcpdump of the issue. We see this, too. We cannot trigger it. Fortunately I have managed to capture some logs with debugging enabled. I have now dug into the ganesha 2.5.3 code and I think the netgroup caching is the culprit. Here some FULL_DEBUG output: 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :Check for address 1.2.3.4 for export id 1 path /gpfsexport 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] client_match :EXPORT :M_DBG :Match V4: 0xcf7fe0 NETGROUP_CLIENT: netgroup1 (options=421021e2root_squash , RWrw, 3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] client_match :EXPORT :M_DBG :Match V4: 0xcfe320 NETGROUP_CLIENT: netgroup2 (options=421021e2root_squash , RWrw, 3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] client_match :EXPORT :M_DBG :Match V4: 0xcfe380 NETGROUP_CLIENT: netgroup3 (options=421021e2root_squash , RWrw, 3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :EXPORT (options=03303002 , , , , , -- Deleg, , ) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :EXPORT_DEFAULTS (options=42102002root_squash , ----, 3--, ---, TCP, ----, Manage_Gids , , anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :default options (options=03303002root_squash , ----, 34-, UDP, TCP, ----, No Manage_Gids, -- Deleg, anon_uid= -2, anon_gid= -2, none, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :Final options (options=42102002root_squash , ----, 3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_rpc_execute :DISP :INFO :DISP: INFO: Client ::ffff:1.2.3.4 is not allowed to access Export_Id 1 /gpfsexport, vers=3, proc=18 The client "client1" is definitely a member of the "netgroup1". But the NETGROUP_CLIENT lookups for "netgroup2" and "netgroup3" can only happen if the netgroup caching code reports that "client1" is NOT a member of "netgroup1". I have also opened a support case at IBM for this. @Malahal: Looks like you have written the netgroup caching code, feel free to ask for further details if required. Kind regards, Ulrich Sibiller -- Dipl.-Inf. Ulrich Sibiller science + computing ag System Administration Hagellocher Weg 73 72070 Tuebingen, Germany https://atos.net/de/deutschland/sc -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From roblogie at au1.ibm.com Tue Jan 8 21:49:51 2019 From: roblogie at au1.ibm.com (Rob Logie) Date: Tue, 8 Jan 2019 21:49:51 +0000 Subject: [gpfsug-discuss] User Login Active Directory authentication on CES nodes with SMB protocol Message-ID: Hi All Is there a way to enable User Login Active Directory authentication on CES nodes with SMB protocol that are joined to an AD domain. ? The AD authentication is working for access to the SMB shares, but not for user login authentication on the CES nodes. Thanks ! Regards, Rob Logie IT Specialist -------------- next part -------------- An HTML attachment was scrubbed... URL: From lgayne at us.ibm.com Tue Jan 8 21:53:51 2019 From: lgayne at us.ibm.com (Lyle Gayne) Date: Tue, 8 Jan 2019 16:53:51 -0500 Subject: [gpfsug-discuss] User Login Active Directory authentication on CES nodes with SMB protocol In-Reply-To: References: Message-ID: Adding Ingo Meents for response From: "Rob Logie" To: gpfsug-discuss at spectrumscale.org Date: 01/08/2019 04:50 PM Subject: [gpfsug-discuss] User Login Active Directory authentication on CES nodes with SMB protocol Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All Is there a way to enable User Login Active Directory authentication on CES nodes with SMB protocol that are joined to an AD domain. ? The AD authentication is working for access to the SMB shares, but not for user login authentication on the CES nodes. Thanks ! Regards, Rob Logie IT Specialist _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From arc at b4restore.com Wed Jan 9 10:25:13 2019 From: arc at b4restore.com (Andi Rhod Christiansen) Date: Wed, 9 Jan 2019 10:25:13 +0000 Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation. Message-ID: Hi, I seem to be unable to find any information on separating protocol services on specific CES nodes within a cluster. Does anyone know if it is possible to take, lets say 4 of the ces nodes within a cluster and dividing them into two and have two of the running SMB and the other two running OBJ instead of having them all run both services? If it is possible it would be great to hear pros and cons about doing this ? Thanks in advance! Venlig hilsen / Best Regards Andi Christiansen IT Solution Specialist -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Wed Jan 9 11:16:49 2019 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Wed, 9 Jan 2019 11:16:49 +0000 Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation. In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Jan 9 12:19:30 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Wed, 9 Jan 2019 12:19:30 +0000 Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation. Message-ID: <5ABB423F-71AF-4469-9FDA-589EA8872B86@bham.ac.uk> You have to run all services on all nodes ( ? ) actually its technically possible to remove the packages once protocols is running on the node, but next time you reboot the node, it will get marked unhealthy and you spend an hour working out why? But what we do to split load is have different IPs assigned to different CES groups and then assign the SMB nodes to the SMB group IPs etc ? Technically a user could still connect to the NFS (in our case) IPs with SMB protocol, but there?s not a lot we can do about that ? though our upstream firewall drops said traffic. Simon From: on behalf of "arc at b4restore.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Wednesday, 9 January 2019 at 10:31 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation. Hi, I seem to be unable to find any information on separating protocol services on specific CES nodes within a cluster. Does anyone know if it is possible to take, lets say 4 of the ces nodes within a cluster and dividing them into two and have two of the running SMB and the other two running OBJ instead of having them all run both services? If it is possible it would be great to hear pros and cons about doing this ? Thanks in advance! Venlig hilsen / Best Regards Andi Christiansen IT Solution Specialist -------------- next part -------------- An HTML attachment was scrubbed... URL: From arc at b4restore.com Wed Jan 9 13:23:17 2019 From: arc at b4restore.com (Andi Rhod Christiansen) Date: Wed, 9 Jan 2019 13:23:17 +0000 Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation. In-Reply-To: References: Message-ID: <1886db2cdf074bf0aaa151c395d300d5@B4RWEX01.internal.b4restore.com> Hi Andrew, Where can I request such a feature? ? Venlig hilsen / Best Regards Andi Rhod Christiansen Fra: gpfsug-discuss-bounces at spectrumscale.org P? vegne af Andrew Beattie Sendt: 9. januar 2019 12:17 Til: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Emne: Re: [gpfsug-discuss] Spectrum Scale protocol node service separation. Andi, All the CES nodes in the same cluster will share the same protocol exports if you want to separate them you need to create remote mount clusters and export the additional protocols via the remote mount it would actually be a useful RFE to have the ablity to create CES groups attached to the base cluster and by group create exports of different protocols, but its not available today. Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: Andi Rhod Christiansen > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation. Date: Wed, Jan 9, 2019 8:31 PM Hi, I seem to be unable to find any information on separating protocol services on specific CES nodes within a cluster. Does anyone know if it is possible to take, lets say 4 of the ces nodes within a cluster and dividing them into two and have two of the running SMB and the other two running OBJ instead of having them all run both services? If it is possible it would be great to hear pros and cons about doing this ? Thanks in advance! Venlig hilsen / Best Regards Andi Christiansen IT Solution Specialist _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From arc at b4restore.com Wed Jan 9 13:24:30 2019 From: arc at b4restore.com (Andi Rhod Christiansen) Date: Wed, 9 Jan 2019 13:24:30 +0000 Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation. In-Reply-To: <5ABB423F-71AF-4469-9FDA-589EA8872B86@bham.ac.uk> References: <5ABB423F-71AF-4469-9FDA-589EA8872B86@bham.ac.uk> Message-ID: Hi Simon, It was actually also the only solution I found if I want to keep them within the same cluster ? Thanks for the reply, I will see what we figure out ! Venlig hilsen / Best Regards Andi Rhod Christiansen Fra: gpfsug-discuss-bounces at spectrumscale.org P? vegne af Simon Thompson Sendt: 9. januar 2019 13:20 Til: gpfsug main discussion list Emne: Re: [gpfsug-discuss] Spectrum Scale protocol node service separation. You have to run all services on all nodes ( ? ) actually its technically possible to remove the packages once protocols is running on the node, but next time you reboot the node, it will get marked unhealthy and you spend an hour working out why? But what we do to split load is have different IPs assigned to different CES groups and then assign the SMB nodes to the SMB group IPs etc ? Technically a user could still connect to the NFS (in our case) IPs with SMB protocol, but there?s not a lot we can do about that ? though our upstream firewall drops said traffic. Simon From: > on behalf of "arc at b4restore.com" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Wednesday, 9 January 2019 at 10:31 To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation. Hi, I seem to be unable to find any information on separating protocol services on specific CES nodes within a cluster. Does anyone know if it is possible to take, lets say 4 of the ces nodes within a cluster and dividing them into two and have two of the running SMB and the other two running OBJ instead of having them all run both services? If it is possible it would be great to hear pros and cons about doing this ? Thanks in advance! Venlig hilsen / Best Regards Andi Christiansen IT Solution Specialist -------------- next part -------------- An HTML attachment was scrubbed... URL: From Paul.Sanchez at deshaw.com Wed Jan 9 14:05:48 2019 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Wed, 9 Jan 2019 14:05:48 +0000 Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation. In-Reply-To: References: Message-ID: <53ec54bb621242109a789e51d61b1377@mbxtoa1.winmail.deshaw.com> The docs say: ?CES supports the following export protocols: NFS, SMB, object, and iSCSI (block). Each protocol can be enabled or disabled in the cluster. If a protocol is enabled in the CES cluster, all CES nodes serve that protocol.? Which would seem to indicate that the answer is ?no?. This kind of thing is another good reason to license Scale by storage capacity rather than by sockets (PVU). This approach was already a good idea due to the flexibility it allows to scale manager, quorum, and NSD server nodes for performance and high-availability without affecting your software licensing costs. This can result in better design and the flexibility to more quickly respond to new problems by adding server nodes. So assuming you?re not on the old PVU licensing model, it is trivial to deploy as many gateway nodes as needed to separate these into distinct remote clusters. You can create an object gateway cluster, and a CES gateway cluster each which only mounts and exports what is necessary. You can even virtualize these servers and host them on the same hardware, if you?re into that. -Paul From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Andi Rhod Christiansen Sent: Wednesday, January 9, 2019 5:25 AM To: gpfsug main discussion list Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation. Hi, I seem to be unable to find any information on separating protocol services on specific CES nodes within a cluster. Does anyone know if it is possible to take, lets say 4 of the ces nodes within a cluster and dividing them into two and have two of the running SMB and the other two running OBJ instead of having them all run both services? If it is possible it would be great to hear pros and cons about doing this ? Thanks in advance! Venlig hilsen / Best Regards Andi Christiansen IT Solution Specialist -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Jan 9 16:35:37 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Wed, 9 Jan 2019 16:35:37 +0000 Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation. In-Reply-To: <53ec54bb621242109a789e51d61b1377@mbxtoa1.winmail.deshaw.com> References: , <53ec54bb621242109a789e51d61b1377@mbxtoa1.winmail.deshaw.com> Message-ID: I think only recently was remote cluster support added (though we have been doing it since CES was released). I agree that capacity licenses have freed us to implement a better solution.. no longer do we run quorum/token managers on nsd nodes to reduce socket costs. I believe socket based licenses are also about to or already no longer available for new customers (existing customers can continue to buy). Carl can probably comment on this? Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Paul.Sanchez at deshaw.com [Paul.Sanchez at deshaw.com] Sent: 09 January 2019 14:05 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Spectrum Scale protocol node service separation. The docs say: ?CES supports the following export protocols: NFS, SMB, object, and iSCSI (block). Each protocol can be enabled or disabled in the cluster. If a protocol is enabled in the CES cluster, all CES nodes serve that protocol.? Which would seem to indicate that the answer is ?no?. This kind of thing is another good reason to license Scale by storage capacity rather than by sockets (PVU). This approach was already a good idea due to the flexibility it allows to scale manager, quorum, and NSD server nodes for performance and high-availability without affecting your software licensing costs. This can result in better design and the flexibility to more quickly respond to new problems by adding server nodes. So assuming you?re not on the old PVU licensing model, it is trivial to deploy as many gateway nodes as needed to separate these into distinct remote clusters. You can create an object gateway cluster, and a CES gateway cluster each which only mounts and exports what is necessary. You can even virtualize these servers and host them on the same hardware, if you?re into that. -Paul From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Andi Rhod Christiansen Sent: Wednesday, January 9, 2019 5:25 AM To: gpfsug main discussion list Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation. Hi, I seem to be unable to find any information on separating protocol services on specific CES nodes within a cluster. Does anyone know if it is possible to take, lets say 4 of the ces nodes within a cluster and dividing them into two and have two of the running SMB and the other two running OBJ instead of having them all run both services? If it is possible it would be great to hear pros and cons about doing this ? Thanks in advance! Venlig hilsen / Best Regards Andi Christiansen IT Solution Specialist From aspalazz at us.ibm.com Wed Jan 9 17:21:03 2019 From: aspalazz at us.ibm.com (Aaron S Palazzolo) Date: Wed, 9 Jan 2019 17:21:03 +0000 Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation In-Reply-To: Message-ID: An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Jan 9 18:04:47 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Wed, 9 Jan 2019 18:04:47 +0000 Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation In-Reply-To: References: , Message-ID: Can you use node affinity within CES groups? For example I have some shiny new servers I want to normally use. If I plan maintenance, I move the IP to another shiny box. But I also have some old off support legacy hardware that I'm happy to use in a DR situation (e.g. they are in another site). So I want a group for my SMB boxes and NFS boxes, but have affinity normally, and then have old hardware in case of failure. Whilst we're on protocols, are there any restrictions on using mixed architectures? I don't recall seeing this but... E.g. my new shiny boxes are ppc64le systems and my old legacy nodes are x86. It's all ctdb locking right .. (ok maybe mixing be and le hosts would be bad) (Sure I'll take a performance hit when I fail to the old nodes, but that is better than no service). Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of aspalazz at us.ibm.com [aspalazz at us.ibm.com] Sent: 09 January 2019 17:21 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Spectrum Scale protocol node service separation Hey guys - I wanted to reply from the Scale development side..... First off, consider CES as a stack and the implications of such: - all protocols are installed on all nodes - if a specific protocol is enabled (SMB, NFS, OBJ, Block), it's enabled for all protocol nodes - if a specific protocol is started (SMB, NFS, OBJ, Block), it's started on all nodes by default, unless manually specified. As was indicated in the e-mail chain, you don't want to be removing rpms to create a subset of nodes serving various protocols as this will cause overall issues. You also don't want to manually be disabling protocols on some nodes/not others in order to achieve nodes that are 'only serving' SMB, for instance. Doing this manual stopping/starting of protocols isn't something that will adhere to failover. =============================================================== A few possible solutions if you want to segregate protocols to specific nodes are: =============================================================== 1) CES-Groups in combination with specific IPs / DNS hostnames that correspond to each protocol. - As mentioned, this can still be bypassed if someone attempts a mount using an IP/DNS name not set for their protocol. However, you could probably prevent some of this with an external firewall rule. - Using CES-Groups confines the IPs/DNS hostnames to very specific nodes 2) Firewall rules - This is best if done external to the cluster, and at a level that can restrict specific protocol traffic to specific IPs/hostnames - combine this with #1 for the best results. - Although it may work, try to stay away from crazy firewall rules on each protocol node itself as this can get confusing very quickly. It's easier if you can set this up external to the nodes. 3) Similar to above but using Node Affinity CES-IP policy - but no CES groups. - Upside is node-affinity will attempt to keep your CES-IPs associated with specific nodes. So if you restrict specific protocol traffic to specific IPs, then they'll stay on nodes you designate - Watch out for failovers. In error cases (or upgrades) where an IP needs to move to another node, it obviously can't remain on the node that's having issues. This means you may have protocol trafffic crossover when this occurs. 4) A separate remote cluster for each CES protocol - In this example, you could make fairly small remote clusters (although we recommend 2->3nodes at least for failover purposes). The local cluster would provide the storage. The remote clusters would mount it. One remote cluster could have only SMB enabled. Another remote cluster could have only OBJ enabled. etc... ------ I hope this helps a bit.... Regards, Aaron Palazzolo IBM Spectrum Scale Deployment, Infrastructure, Virtualization 9042 S Rita Road, Tucson AZ 85744 Phone: 520-799-5161, T/L: 321-5161 E-mail: aspalazz at us.ibm.com ----- Original message ----- From: gpfsug-discuss-request at spectrumscale.org Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Cc: Subject: gpfsug-discuss Digest, Vol 84, Issue 4 Date: Wed, Jan 9, 2019 7:13 AM Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: Spectrum Scale protocol node service separation. (Andi Rhod Christiansen) 2. Re: Spectrum Scale protocol node service separation. (Sanchez, Paul) ---------------------------------------------------------------------- Message: 1 Date: Wed, 9 Jan 2019 13:24:30 +0000 From: Andi Rhod Christiansen To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Spectrum Scale protocol node service separation. Message-ID: Content-Type: text/plain; charset="utf-8" Hi Simon, It was actually also the only solution I found if I want to keep them within the same cluster ? Thanks for the reply, I will see what we figure out ! Venlig hilsen / Best Regards Andi Rhod Christiansen Fra: gpfsug-discuss-bounces at spectrumscale.org P? vegne af Simon Thompson Sendt: 9. januar 2019 13:20 Til: gpfsug main discussion list Emne: Re: [gpfsug-discuss] Spectrum Scale protocol node service separation. You have to run all services on all nodes ( ? ) actually its technically possible to remove the packages once protocols is running on the node, but next time you reboot the node, it will get marked unhealthy and you spend an hour working out why? But what we do to split load is have different IPs assigned to different CES groups and then assign the SMB nodes to the SMB group IPs etc ? Technically a user could still connect to the NFS (in our case) IPs with SMB protocol, but there?s not a lot we can do about that ? though our upstream firewall drops said traffic. Simon From: > on behalf of "arc at b4restore.com" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Wednesday, 9 January 2019 at 10:31 To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation. Hi, I seem to be unable to find any information on separating protocol services on specific CES nodes within a cluster. Does anyone know if it is possible to take, lets say 4 of the ces nodes within a cluster and dividing them into two and have two of the running SMB and the other two running OBJ instead of having them all run both services? If it is possible it would be great to hear pros and cons about doing this ? Thanks in advance! Venlig hilsen / Best Regards Andi Christiansen IT Solution Specialist -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Wed, 9 Jan 2019 14:05:48 +0000 From: "Sanchez, Paul" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Spectrum Scale protocol node service separation. Message-ID: <53ec54bb621242109a789e51d61b1377 at mbxtoa1.winmail.deshaw.com> Content-Type: text/plain; charset="utf-8" The docs say: ?CES supports the following export protocols: NFS, SMB, object, and iSCSI (block). Each protocol can be enabled or disabled in the cluster. If a protocol is enabled in the CES cluster, all CES nodes serve that protocol.? Which would seem to indicate that the answer is ?no?. This kind of thing is another good reason to license Scale by storage capacity rather than by sockets (PVU). This approach was already a good idea due to the flexibility it allows to scale manager, quorum, and NSD server nodes for performance and high-availability without affecting your software licensing costs. This can result in better design and the flexibility to more quickly respond to new problems by adding server nodes. So assuming you?re not on the old PVU licensing model, it is trivial to deploy as many gateway nodes as needed to separate these into distinct remote clusters. You can create an object gateway cluster, and a CES gateway cluster each which only mounts and exports what is necessary. You can even virtualize these servers and host them on the same hardware, if you?re into that. -Paul From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Andi Rhod Christiansen Sent: Wednesday, January 9, 2019 5:25 AM To: gpfsug main discussion list Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation. Hi, I seem to be unable to find any information on separating protocol services on specific CES nodes within a cluster. Does anyone know if it is possible to take, lets say 4 of the ces nodes within a cluster and dividing them into two and have two of the running SMB and the other two running OBJ instead of having them all run both services? If it is possible it would be great to hear pros and cons about doing this ? Thanks in advance! Venlig hilsen / Best Regards Andi Christiansen IT Solution Specialist -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 84, Issue 4 ********************************************* From christof.schmitt at us.ibm.com Wed Jan 9 18:10:13 2019 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Wed, 9 Jan 2019 18:10:13 +0000 Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation In-Reply-To: References: , , Message-ID: An HTML attachment was scrubbed... URL: From christof.schmitt at us.ibm.com Wed Jan 9 19:03:25 2019 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Wed, 9 Jan 2019 19:03:25 +0000 Subject: [gpfsug-discuss] User Login Active Directory authentication on CES nodes with SMB protocol In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.1__=8FBB09EFDFEBBB408f9e8a93df938690918c8FB at .gif Type: image/gif Size: 105 bytes Desc: not available URL: From carlz at us.ibm.com Wed Jan 9 19:19:20 2019 From: carlz at us.ibm.com (Carl Zetie) Date: Wed, 9 Jan 2019 19:19:20 +0000 Subject: [gpfsug-discuss] Spectrum Scale protocol node service separation Message-ID: ST>I believe socket based licenses are also about to or already no longer available ST>for new customers (existing customers can continue to buy). ST>Carl can probably comment on this? That is correct. Friday Jan 11 is the last chance for *new* customers to buy Standard Edition sockets. And as Simon says, those of you who are currently Sockets customers can remain on Sockets, buying additional licenses and renewing existing licenses. (IBM Legal requires me to add, any statement about the future is an intention, not a commitment -- but, as I've said before, as long as it's my decision to make, my intent is to keep Sockets as long as existing customers want them). And yes, one of the reasons I wanted to get away from Socket pricing is the kind of scenarios some of you brought up. Implementing the best deployment topology for your needs shouldn't be a licensing transaction. (Don't even get me started on client licenses). regards, Carl Zetie Program Director Offering Management for Spectrum Scale, IBM ---- (540) 882 9353 ][ Research Triangle Park carlz at us.ibm.com From cblack at nygenome.org Wed Jan 9 19:11:40 2019 From: cblack at nygenome.org (Christopher Black) Date: Wed, 9 Jan 2019 19:11:40 +0000 Subject: [gpfsug-discuss] User Login Active Directory authentication on CES nodes with SMB protocol In-Reply-To: References: Message-ID: <7399F5C1-A23F-4852-B912-0965E111D191@nygenome.org> We use realmd and some automation for sssd configs to get linux hosts to have local login and ssh tied to AD accounts, however we do not apply these configs on our protocol nodes. From: on behalf of Christof Schmitt Reply-To: gpfsug main discussion list Date: Wednesday, January 9, 2019 at 2:03 PM To: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" , Ingo Meents Subject: Re: [gpfsug-discuss] User Login Active Directory authentication on CES nodes with SMB protocol There is the PAM module that would forward authentication requests to winbindd: /usr/lpp/mmfs/lib64/security/pam_gpfs-winbind.so In theory that can be added to the PAM configuration in /etc/pam.d/. On the other hand, we have never tested this nor claimed support, so there might be reasons why this won't work. Other customers have configured sssd manually in addition to the Scale authentication to allow user logon and authentication for sudo. If the request here is to configure AD authentication through mmuserauth and that should also provide user logon, that should probably be treated as a feature request through RFE. Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) ----- Original message ----- From: "Lyle Gayne" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Ingo Meents Subject: Re: [gpfsug-discuss] User Login Active Directory authentication on CES nodes with SMB protocol Date: Tue, Jan 8, 2019 2:54 PM Adding Ingo Meents for response [Inactive hide details for "Rob Logie" ---01/08/2019 04:50:22 PM---Hi All Is there a way to enable User Login Active Directory a]"Rob Logie" ---01/08/2019 04:50:22 PM---Hi All Is there a way to enable User Login Active Directory authentication on CES From: "Rob Logie" To: gpfsug-discuss at spectrumscale.org Date: 01/08/2019 04:50 PM Subject: [gpfsug-discuss] User Login Active Directory authentication on CES nodes with SMB protocol Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All Is there a way to enable User Login Active Directory authentication on CES nodes with SMB protocol that are joined to an AD domain. ? The AD authentication is working for access to the SMB shares, but not for user login authentication on the CES nodes. Thanks ! Regards, Rob Logie IT Specialist _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 106 bytes Desc: image001.gif URL: From Kevin.Buterbaugh at Vanderbilt.Edu Tue Jan 8 22:12:22 2019 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Tue, 8 Jan 2019 22:12:22 +0000 Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? Message-ID: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> Hi All, Happy New Year to all! Personally, I?ll gladly and gratefully settle for 2019 not being a dumpster fire like 2018 was (those who attended my talk at the user group meeting at SC18 know what I?m referring to), but I certainly wish all of you the best! Is there a way to get a list of the filesets in a filesystem without running mmlsfileset? I was kind of expecting to find them in one of the config files somewhere under /var/mmfs but haven?t found them yet in the searching I?ve done. The reason I?m asking is that we have a Python script that users can run that needs to get a list of all the filesets in a filesystem. There are obviously multiple issues with that, so the workaround we?re using for now is to have a cron job which runs mmlsfileset once a day and dumps it out to a text file, which the script then reads. That?s sub-optimal for any day on which a fileset gets created or deleted, so I?m looking for a better way ? one which doesn?t require root privileges and preferably doesn?t involve running a GPFS command at all. Thanks in advance. Kevin P.S. I am still working on metadata and iSCSI testing and will report back on that when complete. P.P.S. We ended up adding our new NSDs comprised of (not really) 12 TB disks to the capacity pool and things are working fine. ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From skylar2 at uw.edu Wed Jan 9 21:37:04 2019 From: skylar2 at uw.edu (Skylar Thompson) Date: Wed, 9 Jan 2019 21:37:04 +0000 Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? In-Reply-To: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> References: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> Message-ID: <20190109213704.bbbqbuqzkrotcjpu@utumno.gs.washington.edu> I suppose you could run the underlying tslsfileset, though that's probably not the answer you're looking for. Out of curiousity, what are you hoping to gain by not running mmlsfileset? Is the problem scaling due to the number of filesets that you have defined? On Tue, Jan 08, 2019 at 10:12:22PM +0000, Buterbaugh, Kevin L wrote: > Hi All, > > Happy New Year to all! Personally, I???ll gladly and gratefully settle for 2019 not being a dumpster fire like 2018 was (those who attended my talk at the user group meeting at SC18 know what I???m referring to), but I certainly wish all of you the best! > > Is there a way to get a list of the filesets in a filesystem without running mmlsfileset? I was kind of expecting to find them in one of the config files somewhere under /var/mmfs but haven???t found them yet in the searching I???ve done. > > The reason I???m asking is that we have a Python script that users can run that needs to get a list of all the filesets in a filesystem. There are obviously multiple issues with that, so the workaround we???re using for now is to have a cron job which runs mmlsfileset once a day and dumps it out to a text file, which the script then reads. That???s sub-optimal for any day on which a fileset gets created or deleted, so I???m looking for a better way ??? one which doesn???t require root privileges and preferably doesn???t involve running a GPFS command at all. > > Thanks in advance. > > Kevin > > P.S. I am still working on metadata and iSCSI testing and will report back on that when complete. > P.P.S. We ended up adding our new NSDs comprised of (not really) 12 TB disks to the capacity pool and things are working fine. > > ??? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine From S.J.Thompson at bham.ac.uk Wed Jan 9 22:42:01 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Wed, 9 Jan 2019 22:42:01 +0000 Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? In-Reply-To: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> References: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> Message-ID: Hi Kevin, Have you looked at the rest API? https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/com.ibm.spectrum.scale.v5r02.doc/bl1adm_listofapicommands.htm I don't know how much access control there is available in the API so not sure if you could lock some sort of service user down to just the get filesets command? Simon _______________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Buterbaugh, Kevin L [Kevin.Buterbaugh at Vanderbilt.Edu] Sent: 08 January 2019 22:12 To: gpfsug main discussion list Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? Hi All, Happy New Year to all! Personally, I?ll gladly and gratefully settle for 2019 not being a dumpster fire like 2018 was (those who attended my talk at the user group meeting at SC18 know what I?m referring to), but I certainly wish all of you the best! Is there a way to get a list of the filesets in a filesystem without running mmlsfileset? I was kind of expecting to find them in one of the config files somewhere under /var/mmfs but haven?t found them yet in the searching I?ve done. The reason I?m asking is that we have a Python script that users can run that needs to get a list of all the filesets in a filesystem. There are obviously multiple issues with that, so the workaround we?re using for now is to have a cron job which runs mmlsfileset once a day and dumps it out to a text file, which the script then reads. That?s sub-optimal for any day on which a fileset gets created or deleted, so I?m looking for a better way ? one which doesn?t require root privileges and preferably doesn?t involve running a GPFS command at all. Thanks in advance. Kevin P.S. I am still working on metadata and iSCSI testing and will report back on that when complete. P.P.S. We ended up adding our new NSDs comprised of (not really) 12 TB disks to the capacity pool and things are working fine. ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 From Paul.Sanchez at deshaw.com Wed Jan 9 23:03:08 2019 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Wed, 9 Jan 2019 23:03:08 +0000 Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? In-Reply-To: <20190109213704.bbbqbuqzkrotcjpu@utumno.gs.washington.edu> References: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> <20190109213704.bbbqbuqzkrotcjpu@utumno.gs.washington.edu> Message-ID: <3d408800d50648dfae25c3c95c1f04c1@mbxtoa1.winmail.deshaw.com> You could also wrap whatever provisioning script you're using (the thing that runs mmcrfileset), which must already be running as root, so that it also updates the cached text file afterward. -Paul -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Skylar Thompson Sent: Wednesday, January 9, 2019 4:37 PM To: Kevin.Buterbaugh at Vanderbilt.Edu Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? I suppose you could run the underlying tslsfileset, though that's probably not the answer you're looking for. Out of curiousity, what are you hoping to gain by not running mmlsfileset? Is the problem scaling due to the number of filesets that you have defined? On Tue, Jan 08, 2019 at 10:12:22PM +0000, Buterbaugh, Kevin L wrote: > Hi All, > > Happy New Year to all! Personally, I???ll gladly and gratefully settle for 2019 not being a dumpster fire like 2018 was (those who attended my talk at the user group meeting at SC18 know what I???m referring to), but I certainly wish all of you the best! > > Is there a way to get a list of the filesets in a filesystem without running mmlsfileset? I was kind of expecting to find them in one of the config files somewhere under /var/mmfs but haven???t found them yet in the searching I???ve done. > > The reason I???m asking is that we have a Python script that users can run that needs to get a list of all the filesets in a filesystem. There are obviously multiple issues with that, so the workaround we???re using for now is to have a cron job which runs mmlsfileset once a day and dumps it out to a text file, which the script then reads. That???s sub-optimal for any day on which a fileset gets created or deleted, so I???m looking for a better way ??? one which doesn???t require root privileges and preferably doesn???t involve running a GPFS command at all. > > Thanks in advance. > > Kevin > > P.S. I am still working on metadata and iSCSI testing and will report back on that when complete. > P.P.S. We ended up adding our new NSDs comprised of (not really) 12 TB disks to the capacity pool and things are working fine. > > ??? > Kevin Buterbaugh - Senior System Administrator Vanderbilt University - > Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu > - (615)875-9633 > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- -- Skylar Thompson (skylar2 at u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354 -- University of Washington School of Medicine _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Kevin.Buterbaugh at Vanderbilt.Edu Wed Jan 9 23:07:00 2019 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 9 Jan 2019 23:07:00 +0000 Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? In-Reply-To: References: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> Message-ID: Hi All, Let me answer Skylar?s questions in another e-mail, which may also tell whether the rest API is a possibility or not. The Python script in question is to display quota information for a user. The mmlsquota command has a couple of issues: 1) its output is confusing to some of our users, 2) more significantly, it displays a ton of information that doesn?t apply to the user running it. For example, it will display all the filesets in a filesystem whether or not the user has access to them. So the Python script figures out what group(s) the user is a member of and only displays information pertinent to them (i.e. the group of the fileset junction path is a group this user is a member of) ? and in a simplified (and potentially colorized) output format. And typing that preceding paragraph caused the lightbulb to go off ? I know the answer to my own question ? have the script run mmlsquota and get the full list of filesets from that, then parse that to determine which ones I actually need to display quota information for. Thanks! Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Jan 9, 2019, at 4:42 PM, Simon Thompson > wrote: Hi Kevin, Have you looked at the rest API? https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSTXKQY_5.0.2%2Fcom.ibm.spectrum.scale.v5r02.doc%2Fbl1adm_listofapicommands.htm&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C36fb451ce9a945f5e0cb08d67683af85%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636826705300525885&sdata=uotWilntiZa2E9RIBE2ikhxxBm3Mk3y%2FW%2FKUHovaJpY%3D&reserved=0 I don't know how much access control there is available in the API so not sure if you could lock some sort of service user down to just the get filesets command? Simon _______________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Buterbaugh, Kevin L [Kevin.Buterbaugh at Vanderbilt.Edu] Sent: 08 January 2019 22:12 To: gpfsug main discussion list Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? Hi All, Happy New Year to all! Personally, I?ll gladly and gratefully settle for 2019 not being a dumpster fire like 2018 was (those who attended my talk at the user group meeting at SC18 know what I?m referring to), but I certainly wish all of you the best! Is there a way to get a list of the filesets in a filesystem without running mmlsfileset? I was kind of expecting to find them in one of the config files somewhere under /var/mmfs but haven?t found them yet in the searching I?ve done. The reason I?m asking is that we have a Python script that users can run that needs to get a list of all the filesets in a filesystem. There are obviously multiple issues with that, so the workaround we?re using for now is to have a cron job which runs mmlsfileset once a day and dumps it out to a text file, which the script then reads. That?s sub-optimal for any day on which a fileset gets created or deleted, so I?m looking for a better way ? one which doesn?t require root privileges and preferably doesn?t involve running a GPFS command at all. Thanks in advance. Kevin P.S. I am still working on metadata and iSCSI testing and will report back on that when complete. P.P.S. We ended up adding our new NSDs comprised of (not really) 12 TB disks to the capacity pool and things are working fine. ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C36fb451ce9a945f5e0cb08d67683af85%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636826705300525885&sdata=WSijRrjhOgQyuWsh9K8ckpjf%2F2CkXfZW1n%2BJw5Gw5tw%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Thu Jan 10 01:13:55 2019 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Thu, 10 Jan 2019 01:13:55 +0000 Subject: [gpfsug-discuss] Get list of filesets _without_runningmmlsfileset? In-Reply-To: References: , <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> Message-ID: An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Jan 10 20:42:50 2019 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 10 Jan 2019 20:42:50 +0000 Subject: [gpfsug-discuss] Get list of filesets _without_runningmmlsfileset? In-Reply-To: References: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> Message-ID: <6A909228-87E7-468E-A51C-086B9C75BB18@vanderbilt.edu> Hi Andrew / All, Well, it does _sound_ useful, but in its current state it?s really not for several reasons, mainly having to do with it being coded in a moderately site-specific way. It needs an overhaul anyway, so I?m going to look at getting rid of as much of that as possible (there?s some definite low-hanging fruit there) and, for the site-specific things that can?t be gotten rid of, maybe consolidating them into one place in the code so that the script could be more generally useful if you just change those values. If I can accomplish those things, then yes, we?d be glad to share the script. But I?ve also realized that I didn?t _entirely_ answer my original question. Yes, mmlsquota will show me all the filesets ? but I also need to know the junction path for each of those filesets. One of the main reasons we wrote this script in the first place is that if you run mmlsquota you see that you have no limits on about 60 filesets (currently we use fileset quotas only on our filesets) ? and that?s because there are no user (or group) quotas in those filesets. The script, however, reads in that text file that is created nightly by root that is nothing more than the output of ?mmlsfileset ?, gets the junction path, looks up the GID of the junction path, and sees if you?re a member of that group. If you?re not, well, no sense in showing you anything about that fileset. But, of course, if you are a member of that group, then we do want to show you the fileset quota for that fileset. So ? my question now is, ?Is there a way for a non-root user? to get the junction path for the fileset(s)? Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Jan 9, 2019, at 7:13 PM, Andrew Beattie > wrote: Kevin, That sounds like a useful script would you care to share? Thanks Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: "Buterbaugh, Kevin L" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: Re: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? Date: Thu, Jan 10, 2019 9:22 AM Hi All, Let me answer Skylar?s questions in another e-mail, which may also tell whether the rest API is a possibility or not. The Python script in question is to display quota information for a user. The mmlsquota command has a couple of issues: 1) its output is confusing to some of our users, 2) more significantly, it displays a ton of information that doesn?t apply to the user running it. For example, it will display all the filesets in a filesystem whether or not the user has access to them. So the Python script figures out what group(s) the user is a member of and only displays information pertinent to them (i.e. the group of the fileset junction path is a group this user is a member of) ? and in a simplified (and potentially colorized) output format. And typing that preceding paragraph caused the lightbulb to go off ? I know the answer to my own question ? have the script run mmlsquota and get the full list of filesets from that, then parse that to determine which ones I actually need to display quota information for. Thanks! Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Jan 9, 2019, at 4:42 PM, Simon Thompson > wrote: Hi Kevin, Have you looked at the rest API? https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSTXKQY_5.0.2%2Fcom.ibm.spectrum.scale.v5r02.doc%2Fbl1adm_listofapicommands.htm&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C36fb451ce9a945f5e0cb08d67683af85%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636826705300525885&sdata=uotWilntiZa2E9RIBE2ikhxxBm3Mk3y%2FW%2FKUHovaJpY%3D&reserved=0 I don't know how much access control there is available in the API so not sure if you could lock some sort of service user down to just the get filesets command? Simon _______________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Buterbaugh, Kevin L [Kevin.Buterbaugh at Vanderbilt.Edu] Sent: 08 January 2019 22:12 To: gpfsug main discussion list Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? Hi All, Happy New Year to all! Personally, I?ll gladly and gratefully settle for 2019 not being a dumpster fire like 2018 was (those who attended my talk at the user group meeting at SC18 know what I?m referring to), but I certainly wish all of you the best! Is there a way to get a list of the filesets in a filesystem without running mmlsfileset? I was kind of expecting to find them in one of the config files somewhere under /var/mmfs but haven?t found them yet in the searching I?ve done. The reason I?m asking is that we have a Python script that users can run that needs to get a list of all the filesets in a filesystem. There are obviously multiple issues with that, so the workaround we?re using for now is to have a cron job which runs mmlsfileset once a day and dumps it out to a text file, which the script then reads. That?s sub-optimal for any day on which a fileset gets created or deleted, so I?m looking for a better way ? one which doesn?t require root privileges and preferably doesn?t involve running a GPFS command at all. Thanks in advance. Kevin P.S. I am still working on metadata and iSCSI testing and will report back on that when complete. P.P.S. We ended up adding our new NSDs comprised of (not really) 12 TB disks to the capacity pool and things are working fine. ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C36fb451ce9a945f5e0cb08d67683af85%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636826705300525885&sdata=WSijRrjhOgQyuWsh9K8ckpjf%2F2CkXfZW1n%2BJw5Gw5tw%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cc1ffac821c5f4524104908d67698e948%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636826796467009700&sdata=Xfz4JiItI8ukHgnvO5YoN27jVpk6Ngsk03NtMrKJcHk%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.childs at qmul.ac.uk Fri Jan 11 12:50:17 2019 From: p.childs at qmul.ac.uk (Peter Childs) Date: Fri, 11 Jan 2019 12:50:17 +0000 Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? In-Reply-To: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> References: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> Message-ID: <300e6e1f8fb4daf9278f0b67262c4046127e0bbf.camel@qmul.ac.uk> We have a similar issue, I'm wondering if getting mmlsfileset to work as a user is a reasonable "request for enhancement" I suspect it would need better wording. We too have a rather complex script to report on quota's that I suspect does a similar job. It works by having all the filesets mounted in known locations and names matching mount point names. It then works out which ones are needed by looking at the group ownership, Its very slow and a little cumbersome. Not least because it was written ages ago in a mix of bash, sed, awk and find. On Tue, 2019-01-08 at 22:12 +0000, Buterbaugh, Kevin L wrote: Hi All, Happy New Year to all! Personally, I?ll gladly and gratefully settle for 2019 not being a dumpster fire like 2018 was (those who attended my talk at the user group meeting at SC18 know what I?m referring to), but I certainly wish all of you the best! Is there a way to get a list of the filesets in a filesystem without running mmlsfileset? I was kind of expecting to find them in one of the config files somewhere under /var/mmfs but haven?t found them yet in the searching I?ve done. The reason I?m asking is that we have a Python script that users can run that needs to get a list of all the filesets in a filesystem. There are obviously multiple issues with that, so the workaround we?re using for now is to have a cron job which runs mmlsfileset once a day and dumps it out to a text file, which the script then reads. That?s sub-optimal for any day on which a fileset gets created or deleted, so I?m looking for a better way ? one which doesn?t require root privileges and preferably doesn?t involve running a GPFS command at all. Thanks in advance. Kevin P.S. I am still working on metadata and iSCSI testing and will report back on that when complete. P.P.S. We ended up adding our new NSDs comprised of (not really) 12 TB disks to the capacity pool and things are working fine. ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Jan 11 14:19:28 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 11 Jan 2019 14:19:28 +0000 Subject: [gpfsug-discuss] A cautionary tale of upgrades Message-ID: <7085B14A-92B4-47DE-9411-FB544FD4A610@bham.ac.uk> I?ll start by saying this is our experience, maybe we did something stupid along the way, but just in case others see similar issues ? We have a cluster which contains protocol nodes, these were all happily running GPFS 5.0.1-2 code. But the cluster was a only 4 nodes + 1 quorum node ? manager and quorum functions were handled by the 4 protocol nodes. Then one day we needed to reboot a protocol node. We did so and its disk controller appeared to have failed. Oh well, we thought we?ll fix that another day, we still have three other quorum nodes. As they are all getting a little long in the tooth and were starting to struggle, we thought, well we have DME, lets add some new nodes for quorum and token functions. Being shiny and new they were all installed with GPFS 5.0.2-1 code. All was well. The some-time later, we needed to restart another of the CES nodes, when we started GPFS on the node, it was causing havock in our cluster ? CES IPs were constantly being assigned, then removed from the remaining nodes in the cluster. Crap we thought and disabled the node in the cluster. This made things stabilise and as we?d been having other GPFS issues, we didn?t want service to be interrupted whilst we dug into this. Besides, it was nearly Christmas and we had conferences and other work to content with. More time passes and we?re about to cut over all our backend storage to some shiny new DSS-G kit, so we plan a whole system maintenance window. We finish all our data sync?s and then try to start our protocol nodes to test them. No dice ? we can?t get any of the nodes to bring up IPs, the logs look like they start the assignment process, but then gave up. A lot of digging in the mm korn shell scripts, and some studious use of DEBUG=1 when testing, we find that mmcesnetmvaddress is calling ?tsctl shownodes up?. On our protocol nodes, we find output of the form: bear-er-dtn01.bb2.cluster.cluster,rds-aw-ctdb01-data.bb2.cluster.cluster,rds-er-ctdb01-data.bb2.cluster.cluster,bber-irods-ires01-data.bb2.cluster.cluster,bber-irods-icat01-data.bb2.cluster.cluster,bbaw-irods-icat01-data.bb2.cluster.cluster,proto-pg-mgr01.bear.cluster.cluster,proto-pg-pf01.bear.cluster.cluster,proto-pg-dtn01.bear.cluster.cluster,proto-er-mgr01.bear.cluster.cluster,proto-er-pf01.bear.cluster.cluster,proto-aw-mgr01.bear.cluster.cluster,proto-aw-pf01.bear.cluster.cluster Now our DNS name for these nodes is bb2.cluster ? something is repeating the DNS name. So we dig around, resolv.conf, /etc/hosts etc all look good and name resolution seems fine. We look around on the manager/quorum nodes and they don?t do this cluster.cluster thing. We can?t find anything else Linux config wise that looks bad. In fact the only difference is that our CES nodes are running 5.0.1-2 and the manager nodes 5.0.2-1. Given we?re changing the whole storage hardware, we didn?t want to change the GPFS/NFS/SMB code on the CES nodes, (we?ve been bitten before with SMB packages not working properly in our environment), but we go ahead and do GPFS and NFS packages. Suddenly, magically all is working again. CES starts fine and IPs get assigned OK. And tsctl gives the correct output. So, my supposition is that there is some incompatibility between 5.0.1-2 and 5.0.2-1 when running CES and the cluster manager is running on 5.0.2-1. As I said before, I don?t have hard evidence we did something stupid, but it certainly is fishy. We?re guessing this same ?feature? was the cause of the CES issues we saw when we rebooted a CES node and the IPs kept deassigning? It looks like all was well as we added the manager nodes after CES was started, but when a CES node restarted, things broke. We got everything working again in house so didn?t raise a PMR, but if you find yourself in this upgrade path, beware! Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From MDIETZ at de.ibm.com Fri Jan 11 14:58:20 2019 From: MDIETZ at de.ibm.com (Mathias Dietz) Date: Fri, 11 Jan 2019 15:58:20 +0100 Subject: [gpfsug-discuss] A cautionary tale of upgrades In-Reply-To: <7085B14A-92B4-47DE-9411-FB544FD4A610@bham.ac.uk> References: <7085B14A-92B4-47DE-9411-FB544FD4A610@bham.ac.uk> Message-ID: Hi Simon, you likely run into the following issue: APAR IV93896 - https://www-01.ibm.com/support/docview.wss?uid=isg1IV93896 This problem happens only if you use different host domains within a cluster and will mostly impact CES. It is unrelated to upgrade or mixed version clusters. Its has been fixed with 5.0.2, therefore I recommend to upgrade soon. Mit freundlichen Gr??en / Kind regards Mathias Dietz Spectrum Scale Development - Release Lead Architect (4.2.x) Spectrum Scale RAS Architect --------------------------------------------------------------------------- IBM Deutschland Am Weiher 24 65451 Kelsterbach Phone: +49 70342744105 Mobile: +49-15152801035 E-Mail: mdietz at de.ibm.com ----------------------------------------------------------------------------- IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martina Koederitz, Gesch?ftsf?hrung: Dirk WittkoppSitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 11/01/2019 15:19 Subject: [gpfsug-discuss] A cautionary tale of upgrades Sent by: gpfsug-discuss-bounces at spectrumscale.org I?ll start by saying this is our experience, maybe we did something stupid along the way, but just in case others see similar issues ? We have a cluster which contains protocol nodes, these were all happily running GPFS 5.0.1-2 code. But the cluster was a only 4 nodes + 1 quorum node ? manager and quorum functions were handled by the 4 protocol nodes. Then one day we needed to reboot a protocol node. We did so and its disk controller appeared to have failed. Oh well, we thought we?ll fix that another day, we still have three other quorum nodes. As they are all getting a little long in the tooth and were starting to struggle, we thought, well we have DME, lets add some new nodes for quorum and token functions. Being shiny and new they were all installed with GPFS 5.0.2-1 code. All was well. The some-time later, we needed to restart another of the CES nodes, when we started GPFS on the node, it was causing havock in our cluster ? CES IPs were constantly being assigned, then removed from the remaining nodes in the cluster. Crap we thought and disabled the node in the cluster. This made things stabilise and as we?d been having other GPFS issues, we didn?t want service to be interrupted whilst we dug into this. Besides, it was nearly Christmas and we had conferences and other work to content with. More time passes and we?re about to cut over all our backend storage to some shiny new DSS-G kit, so we plan a whole system maintenance window. We finish all our data sync?s and then try to start our protocol nodes to test them. No dice ? we can?t get any of the nodes to bring up IPs, the logs look like they start the assignment process, but then gave up. A lot of digging in the mm korn shell scripts, and some studious use of DEBUG=1 when testing, we find that mmcesnetmvaddress is calling ?tsctl shownodes up?. On our protocol nodes, we find output of the form: bear-er-dtn01.bb2.cluster.cluster,rds-aw-ctdb01-data.bb2.cluster.cluster,rds-er-ctdb01-data.bb2.cluster.cluster,bber-irods-ires01-data.bb2.cluster.cluster,bber-irods-icat01-data.bb2.cluster.cluster,bbaw-irods-icat01-data.bb2.cluster.cluster,proto-pg-mgr01.bear.cluster.cluster,proto-pg-pf01.bear.cluster.cluster,proto-pg-dtn01.bear.cluster.cluster,proto-er-mgr01.bear.cluster.cluster,proto-er-pf01.bear.cluster.cluster,proto-aw-mgr01.bear.cluster.cluster,proto-aw-pf01.bear.cluster.cluster Now our DNS name for these nodes is bb2.cluster ? something is repeating the DNS name. So we dig around, resolv.conf, /etc/hosts etc all look good and name resolution seems fine. We look around on the manager/quorum nodes and they don?t do this cluster.cluster thing. We can?t find anything else Linux config wise that looks bad. In fact the only difference is that our CES nodes are running 5.0.1-2 and the manager nodes 5.0.2-1. Given we?re changing the whole storage hardware, we didn?t want to change the GPFS/NFS/SMB code on the CES nodes, (we?ve been bitten before with SMB packages not working properly in our environment), but we go ahead and do GPFS and NFS packages. Suddenly, magically all is working again. CES starts fine and IPs get assigned OK. And tsctl gives the correct output. So, my supposition is that there is some incompatibility between 5.0.1-2 and 5.0.2-1 when running CES and the cluster manager is running on 5.0.2-1. As I said before, I don?t have hard evidence we did something stupid, but it certainly is fishy. We?re guessing this same ?feature? was the cause of the CES issues we saw when we rebooted a CES node and the IPs kept deassigning? It looks like all was well as we added the manager nodes after CES was started, but when a CES node restarted, things broke. We got everything working again in house so didn?t raise a PMR, but if you find yourself in this upgrade path, beware! Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Fri Jan 11 15:00:51 2019 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Fri, 11 Jan 2019 15:00:51 +0000 Subject: [gpfsug-discuss] A cautionary tale of upgrades In-Reply-To: <7085B14A-92B4-47DE-9411-FB544FD4A610@bham.ac.uk> References: <7085B14A-92B4-47DE-9411-FB544FD4A610@bham.ac.uk> Message-ID: Hallo Simon, Welcome to the Club. These behavior are a Bug in tsctl to change the DNS names . We had this already 4 weeks ago. The fix was Update to 5.0.2.1. Regards Renar Von meinem iPhone gesendet Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Am 11.01.2019 um 15:19 schrieb Simon Thompson >: I?ll start by saying this is our experience, maybe we did something stupid along the way, but just in case others see similar issues ? We have a cluster which contains protocol nodes, these were all happily running GPFS 5.0.1-2 code. But the cluster was a only 4 nodes + 1 quorum node ? manager and quorum functions were handled by the 4 protocol nodes. Then one day we needed to reboot a protocol node. We did so and its disk controller appeared to have failed. Oh well, we thought we?ll fix that another day, we still have three other quorum nodes. As they are all getting a little long in the tooth and were starting to struggle, we thought, well we have DME, lets add some new nodes for quorum and token functions. Being shiny and new they were all installed with GPFS 5.0.2-1 code. All was well. The some-time later, we needed to restart another of the CES nodes, when we started GPFS on the node, it was causing havock in our cluster ? CES IPs were constantly being assigned, then removed from the remaining nodes in the cluster. Crap we thought and disabled the node in the cluster. This made things stabilise and as we?d been having other GPFS issues, we didn?t want service to be interrupted whilst we dug into this. Besides, it was nearly Christmas and we had conferences and other work to content with. More time passes and we?re about to cut over all our backend storage to some shiny new DSS-G kit, so we plan a whole system maintenance window. We finish all our data sync?s and then try to start our protocol nodes to test them. No dice ? we can?t get any of the nodes to bring up IPs, the logs look like they start the assignment process, but then gave up. A lot of digging in the mm korn shell scripts, and some studious use of DEBUG=1 when testing, we find that mmcesnetmvaddress is calling ?tsctl shownodes up?. On our protocol nodes, we find output of the form: bear-er-dtn01.bb2.cluster.cluster,rds-aw-ctdb01-data.bb2.cluster.cluster,rds-er-ctdb01-data.bb2.cluster.cluster,bber-irods-ires01-data.bb2.cluster.cluster,bber-irods-icat01-data.bb2.cluster.cluster,bbaw-irods-icat01-data.bb2.cluster.cluster,proto-pg-mgr01.bear.cluster.cluster,proto-pg-pf01.bear.cluster.cluster,proto-pg-dtn01.bear.cluster.cluster,proto-er-mgr01.bear.cluster.cluster,proto-er-pf01.bear.cluster.cluster,proto-aw-mgr01.bear.cluster.cluster,proto-aw-pf01.bear.cluster.cluster Now our DNS name for these nodes is bb2.cluster ? something is repeating the DNS name. So we dig around, resolv.conf, /etc/hosts etc all look good and name resolution seems fine. We look around on the manager/quorum nodes and they don?t do this cluster.cluster thing. We can?t find anything else Linux config wise that looks bad. In fact the only difference is that our CES nodes are running 5.0.1-2 and the manager nodes 5.0.2-1. Given we?re changing the whole storage hardware, we didn?t want to change the GPFS/NFS/SMB code on the CES nodes, (we?ve been bitten before with SMB packages not working properly in our environment), but we go ahead and do GPFS and NFS packages. Suddenly, magically all is working again. CES starts fine and IPs get assigned OK. And tsctl gives the correct output. So, my supposition is that there is some incompatibility between 5.0.1-2 and 5.0.2-1 when running CES and the cluster manager is running on 5.0.2-1. As I said before, I don?t have hard evidence we did something stupid, but it certainly is fishy. We?re guessing this same ?feature? was the cause of the CES issues we saw when we rebooted a CES node and the IPs kept deassigning? It looks like all was well as we added the manager nodes after CES was started, but when a CES node restarted, things broke. We got everything working again in house so didn?t raise a PMR, but if you find yourself in this upgrade path, beware! Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Fri Jan 11 15:48:50 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 11 Jan 2019 15:48:50 +0000 Subject: [gpfsug-discuss] A cautionary tale of upgrades In-Reply-To: References: <7085B14A-92B4-47DE-9411-FB544FD4A610@bham.ac.uk>, Message-ID: Could well be. Still it's pretty scary that this sort of thing could hit you way after the different DNS name nodes were added. It might be months before you restart the CES nodes. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of MDIETZ at de.ibm.com [MDIETZ at de.ibm.com] Sent: 11 January 2019 14:58 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] A cautionary tale of upgrades Hi Simon, you likely run into the following issue: APAR IV93896 - https://www-01.ibm.com/support/docview.wss?uid=isg1IV93896 This problem happens only if you use different host domains within a cluster and will mostly impact CES. It is unrelated to upgrade or mixed version clusters. Its has been fixed with 5.0.2, therefore I recommend to upgrade soon. Mit freundlichen Gr??en / Kind regards Mathias Dietz Spectrum Scale Development - Release Lead Architect (4.2.x) Spectrum Scale RAS Architect --------------------------------------------------------------------------- IBM Deutschland Am Weiher 24 65451 Kelsterbach Phone: +49 70342744105 Mobile: +49-15152801035 E-Mail: mdietz at de.ibm.com ----------------------------------------------------------------------------- IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martina Koederitz, Gesch?ftsf?hrung: Dirk WittkoppSitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 11/01/2019 15:19 Subject: [gpfsug-discuss] A cautionary tale of upgrades Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I?ll start by saying this is our experience, maybe we did something stupid along the way, but just in case others see similar issues ? We have a cluster which contains protocol nodes, these were all happily running GPFS 5.0.1-2 code. But the cluster was a only 4 nodes + 1 quorum node ? manager and quorum functions were handled by the 4 protocol nodes. Then one day we needed to reboot a protocol node. We did so and its disk controller appeared to have failed. Oh well, we thought we?ll fix that another day, we still have three other quorum nodes. As they are all getting a little long in the tooth and were starting to struggle, we thought, well we have DME, lets add some new nodes for quorum and token functions. Being shiny and new they were all installed with GPFS 5.0.2-1 code. All was well. The some-time later, we needed to restart another of the CES nodes, when we started GPFS on the node, it was causing havock in our cluster ? CES IPs were constantly being assigned, then removed from the remaining nodes in the cluster. Crap we thought and disabled the node in the cluster. This made things stabilise and as we?d been having other GPFS issues, we didn?t want service to be interrupted whilst we dug into this. Besides, it was nearly Christmas and we had conferences and other work to content with. More time passes and we?re about to cut over all our backend storage to some shiny new DSS-G kit, so we plan a whole system maintenance window. We finish all our data sync?s and then try to start our protocol nodes to test them. No dice ? we can?t get any of the nodes to bring up IPs, the logs look like they start the assignment process, but then gave up. A lot of digging in the mm korn shell scripts, and some studious use of DEBUG=1 when testing, we find that mmcesnetmvaddress is calling ?tsctl shownodes up?. On our protocol nodes, we find output of the form: bear-er-dtn01.bb2.cluster.cluster,rds-aw-ctdb01-data.bb2.cluster.cluster,rds-er-ctdb01-data.bb2.cluster.cluster,bber-irods-ires01-data.bb2.cluster.cluster,bber-irods-icat01-data.bb2.cluster.cluster,bbaw-irods-icat01-data.bb2.cluster.cluster,proto-pg-mgr01.bear.cluster.cluster,proto-pg-pf01.bear.cluster.cluster,proto-pg-dtn01.bear.cluster.cluster,proto-er-mgr01.bear.cluster.cluster,proto-er-pf01.bear.cluster.cluster,proto-aw-mgr01.bear.cluster.cluster,proto-aw-pf01.bear.cluster.cluster Now our DNS name for these nodes is bb2.cluster ? something is repeating the DNS name. So we dig around, resolv.conf, /etc/hosts etc all look good and name resolution seems fine. We look around on the manager/quorum nodes and they don?t do this cluster.cluster thing. We can?t find anything else Linux config wise that looks bad. In fact the only difference is that our CES nodes are running 5.0.1-2 and the manager nodes 5.0.2-1. Given we?re changing the whole storage hardware, we didn?t want to change the GPFS/NFS/SMB code on the CES nodes, (we?ve been bitten before with SMB packages not working properly in our environment), but we go ahead and do GPFS and NFS packages. Suddenly, magically all is working again. CES starts fine and IPs get assigned OK. And tsctl gives the correct output. So, my supposition is that there is some incompatibility between 5.0.1-2 and 5.0.2-1 when running CES and the cluster manager is running on 5.0.2-1. As I said before, I don?t have hard evidence we did something stupid, but it certainly is fishy. We?re guessing this same ?feature? was the cause of the CES issues we saw when we rebooted a CES node and the IPs kept deassigning? It looks like all was well as we added the manager nodes after CES was started, but when a CES node restarted, things broke. We got everything working again in house so didn?t raise a PMR, but if you find yourself in this upgrade path, beware! Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From makaplan at us.ibm.com Fri Jan 11 17:31:35 2019 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Fri, 11 Jan 2019 14:31:35 -0300 Subject: [gpfsug-discuss] Get list offilesets_without_runningmmlsfileset? In-Reply-To: <6A909228-87E7-468E-A51C-086B9C75BB18@vanderbilt.edu> References: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> <6A909228-87E7-468E-A51C-086B9C75BB18@vanderbilt.edu> Message-ID: ?Is there a way for a non-root user? to get the junction path for the fileset(s)? Presuming the user has some path to some file in the fileset... Issue `mmlsattr -L path` then "walk" back towards the root by discarding successive path suffixes and watch for changes in the fileset name field Why doesn't mmlsfileset work for non-root users? I don't know. Perhaps the argument has to do with security or confidentiality. On my test system it gives a bogus error, when it should say something about root or super-user. -------------- next part -------------- An HTML attachment was scrubbed... URL: From JRLang at uwyo.edu Fri Jan 11 16:24:17 2019 From: JRLang at uwyo.edu (Jeffrey R. Lang) Date: Fri, 11 Jan 2019 16:24:17 +0000 Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? In-Reply-To: <300e6e1f8fb4daf9278f0b67262c4046127e0bbf.camel@qmul.ac.uk> References: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> <300e6e1f8fb4daf9278f0b67262c4046127e0bbf.camel@qmul.ac.uk> Message-ID: What we do is the use ?mmlsquota -Y ? which will list out all the filesets in an easily parseable format. And the command can be run by the user. From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Peter Childs Sent: Friday, January 11, 2019 6:50 AM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? ? This message was sent from a non-UWYO address. Please exercise caution when clicking links or opening attachments from external sources. We have a similar issue, I'm wondering if getting mmlsfileset to work as a user is a reasonable "request for enhancement" I suspect it would need better wording. We too have a rather complex script to report on quota's that I suspect does a similar job. It works by having all the filesets mounted in known locations and names matching mount point names. It then works out which ones are needed by looking at the group ownership, Its very slow and a little cumbersome. Not least because it was written ages ago in a mix of bash, sed, awk and find. On Tue, 2019-01-08 at 22:12 +0000, Buterbaugh, Kevin L wrote: Hi All, Happy New Year to all! Personally, I?ll gladly and gratefully settle for 2019 not being a dumpster fire like 2018 was (those who attended my talk at the user group meeting at SC18 know what I?m referring to), but I certainly wish all of you the best! Is there a way to get a list of the filesets in a filesystem without running mmlsfileset? I was kind of expecting to find them in one of the config files somewhere under /var/mmfs but haven?t found them yet in the searching I?ve done. The reason I?m asking is that we have a Python script that users can run that needs to get a list of all the filesets in a filesystem. There are obviously multiple issues with that, so the workaround we?re using for now is to have a cron job which runs mmlsfileset once a day and dumps it out to a text file, which the script then reads. That?s sub-optimal for any day on which a fileset gets created or deleted, so I?m looking for a better way ? one which doesn?t require root privileges and preferably doesn?t involve running a GPFS command at all. Thanks in advance. Kevin P.S. I am still working on metadata and iSCSI testing and will report back on that when complete. P.P.S. We ended up adding our new NSDs comprised of (not really) 12 TB disks to the capacity pool and things are working fine. ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Sat Jan 12 03:07:29 2019 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Sat, 12 Jan 2019 03:07:29 +0000 Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? In-Reply-To: References: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> <300e6e1f8fb4daf9278f0b67262c4046127e0bbf.camel@qmul.ac.uk> Message-ID: <1CD7EBDE-F39D-4410-9028-EF9FBF22C6EC@vanderbilt.edu> Hi All, I appreciate the time several of you have taken to respond to my inquiry. However, unless I?m missing something - and my apologies if I am - none so far appear to allow me to obtain the list of junction paths as a non-root user. Yes, mmlsquota shows all the filesets. But from there I need to then be able to find out where that fileset is mounted in the directory tree so that I can see who the owner and group of that directory are. Only if the user running the script is either the owner or a member of the group do I want to display the fileset quota for that fileset to the user. Thanks again? Kevin On Jan 11, 2019, at 10:24 AM, Jeffrey R. Lang > wrote: What we do is the use ?mmlsquota -Y ? which will list out all the filesets in an easily parseable format. And the command can be run by the user. From: gpfsug-discuss-bounces at spectrumscale.org > On Behalf Of Peter Childs Sent: Friday, January 11, 2019 6:50 AM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? ? This message was sent from a non-UWYO address. Please exercise caution when clicking links or opening attachments from external sources. We have a similar issue, I'm wondering if getting mmlsfileset to work as a user is a reasonable "request for enhancement" I suspect it would need better wording. We too have a rather complex script to report on quota's that I suspect does a similar job. It works by having all the filesets mounted in known locations and names matching mount point names. It then works out which ones are needed by looking at the group ownership, Its very slow and a little cumbersome. Not least because it was written ages ago in a mix of bash, sed, awk and find. On Tue, 2019-01-08 at 22:12 +0000, Buterbaugh, Kevin L wrote: Hi All, Happy New Year to all! Personally, I?ll gladly and gratefully settle for 2019 not being a dumpster fire like 2018 was (those who attended my talk at the user group meeting at SC18 know what I?m referring to), but I certainly wish all of you the best! Is there a way to get a list of the filesets in a filesystem without running mmlsfileset? I was kind of expecting to find them in one of the config files somewhere under /var/mmfs but haven?t found them yet in the searching I?ve done. The reason I?m asking is that we have a Python script that users can run that needs to get a list of all the filesets in a filesystem. There are obviously multiple issues with that, so the workaround we?re using for now is to have a cron job which runs mmlsfileset once a day and dumps it out to a text file, which the script then reads. That?s sub-optimal for any day on which a fileset gets created or deleted, so I?m looking for a better way ? one which doesn?t require root privileges and preferably doesn?t involve running a GPFS command at all. Thanks in advance. Kevin P.S. I am still working on metadata and iSCSI testing and will report back on that when complete. P.P.S. We ended up adding our new NSDs comprised of (not really) 12 TB disks to the capacity pool and things are working fine. ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cee10c1e22a474fedceb408d678318231%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636828551398056004&sdata=F56RKhMef0zYjAj2dKFu3bAuq7xQvFoulYhwDnfN1Ms%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Sat Jan 12 20:42:42 2019 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Sat, 12 Jan 2019 15:42:42 -0500 Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? In-Reply-To: <1CD7EBDE-F39D-4410-9028-EF9FBF22C6EC@vanderbilt.edu> References: <30C84AD4-D2BB-4923-A8BD-B51C8D8D347A@vanderbilt.edu> <300e6e1f8fb4daf9278f0b67262c4046127e0bbf.camel@qmul.ac.uk> <1CD7EBDE-F39D-4410-9028-EF9FBF22C6EC@vanderbilt.edu> Message-ID: <13713.1547325762@turing-police.cc.vt.edu> On Sat, 12 Jan 2019 03:07:29 +0000, "Buterbaugh, Kevin L" said: > But from there I need to then be able to find out where that fileset is > mounted in the directory tree so that I can see who the owner and group of that > directory are. You're not able to leverage a local naming scheme? There's no connection between the name of the fileset and where it is in the tree? I would hope there is, because otherwise when your tool says 'Fileset ag5eg19 is over quota', your poor user will now be confused over what director(y/ies) need to be cleaned up. If your tool says 'Fileset foo_bar_baz is over quota' and the user knows that's mounted at /gpfs/foo/bar/baz then it's actionable. And if the user knows what the mapping is, your script can know it too.... From scottg at emailhosting.com Mon Jan 14 04:09:57 2019 From: scottg at emailhosting.com (Scott Goldman) Date: Sun, 13 Jan 2019 23:09:57 -0500 Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? In-Reply-To: <13713.1547325762@turing-police.cc.vt.edu> Message-ID: Kevin, Something I've done in the past is to create a service that once an hour/day/week that would build a static file that consists of the needed output. As long as you can take the update delay (or perhaps trigger the update with a callback), this should work and could actually be lighter on the system. Sent from my BlackBerry - the most secure mobile device ? Original Message ? From: valdis.kletnieks at vt.edu Sent: January 12, 2019 4:07 PM To: gpfsug-discuss at spectrumscale.org Reply-to: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? On Sat, 12 Jan 2019 03:07:29 +0000, "Buterbaugh, Kevin L" said: > But from there I need to then be able to find out where that fileset is > mounted in the directory tree so that I can see who the owner and group of that > directory are. You're not able to leverage a local naming scheme? There's no connection between the name of the fileset and where it is in the tree?? I would hope there is, because otherwise when your tool says 'Fileset ag5eg19 is over quota', your poor user will now be confused over what director(y/ies) need to be cleaned up.? If your tool says 'Fileset foo_bar_baz is over quota' and the user knows that's mounted at /gpfs/foo/bar/baz then it's actionable. And if the user knows what the mapping is, your script can know it too.... _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From olaf.weiser at de.ibm.com Mon Jan 14 06:31:28 2019 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Mon, 14 Jan 2019 07:31:28 +0100 Subject: [gpfsug-discuss] A cautionary tale of upgrades In-Reply-To: <7085B14A-92B4-47DE-9411-FB544FD4A610@bham.ac.uk> References: <7085B14A-92B4-47DE-9411-FB544FD4A610@bham.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: From sandeep.patil at in.ibm.com Mon Jan 14 12:54:29 2019 From: sandeep.patil at in.ibm.com (Sandeep Ramesh) Date: Mon, 14 Jan 2019 12:54:29 +0000 Subject: [gpfsug-discuss] Latest Technical Blogs on IBM Spectrum Scale (Q4 2018) In-Reply-To: References: Message-ID: Dear User Group Members, In continuation, here are list of development blogs in the this quarter (Q4 2018). We now have over 100+ developer blogs on Spectrum Scale/ESS. As discussed in User Groups, passing it along to the emailing list. Redpaper: IBM Spectrum Scale and IBM StoredIQ: Identifying and securing your business data to support regulatory requirements http://www.redbooks.ibm.com/abstracts/redp5525.html?Open IBM Spectrum Scale Memory Usage https://www.slideshare.net/tomerperry/ibm-spectrum-scale-memory-usage?qid=50a1dfda-3102-484f-b9d0-14b69fc4800b&v=&b=&from_search=2 Spectrum Scale and Containers https://developer.ibm.com/storage/2018/12/20/spectrum-scale-and-containers/ IBM Elastic Storage Server Performance Graphical Visualization with Grafana https://developer.ibm.com/storage/2018/12/18/ibm-elastic-storage-server-performance-graphical-visualization-with-grafana/ Hadoop Performance for disaggregated compute and storage configurations based on IBM Spectrum Scale Storage https://developer.ibm.com/storage/2018/12/13/hadoop-performance-for-disaggregated-compute-and-storage-configurations-based-on-ibm-spectrum-scale-storage/ EMS HA in ESS LE (Little Endian) environment https://developer.ibm.com/storage/2018/12/07/ems-ha-in-ess-le-little-endian-environment/ What?s new in ESS 5.3.2 https://developer.ibm.com/storage/2018/12/04/whats-new-in-ess-5-3-2/ Administer your Spectrum Scale cluster easily https://developer.ibm.com/storage/2018/11/13/administer-your-spectrum-scale-cluster-easily/ Disaster Recovery using Spectrum Scale?s Active File Management https://developer.ibm.com/storage/2018/11/13/disaster-recovery-using-spectrum-scales-active-file-management/ Recovery Group Failover Procedure of IBM Elastic Storage Server (ESS) https://developer.ibm.com/storage/2018/10/08/recovery-group-failover-procedure-ibm-elastic-storage-server-ess/ Whats new in IBM Elastic Storage Server (ESS) Version 5.3.1 and 5.3.1.1 https://developer.ibm.com/storage/2018/10/04/whats-new-ibm-elastic-storage-server-ess-version-5-3-1-5-3-1-1/ For more : Search /browse here: https://developer.ibm.com/storage/blog User Group Presentations: https://www.spectrumscale.org/presentations/ Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Blogs%2C%20White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Date: 10/03/2018 08:48 PM Subject: Latest Technical Blogs on IBM Spectrum Scale (Q3 2018) Dear User Group Members, In continuation, here are list of development blogs in the this quarter (Q3 2018). We now have over 100+ developer blogs on Spectrum Scale/ESS. As discussed in User Groups, passing it along to the emailing list. How NFS exports became more dynamic with Spectrum Scale 5.0.2 https://developer.ibm.com/storage/2018/10/02/nfs-exports-became-dynamic-spectrum-scale-5-0-2/ HPC storage on AWS (IBM Spectrum Scale) https://developer.ibm.com/storage/2018/10/02/hpc-storage-aws-ibm-spectrum-scale/ Upgrade with Excluding the node(s) using Install-toolkit https://developer.ibm.com/storage/2018/09/30/upgrade-excluding-nodes-using-install-toolkit/ Offline upgrade using Install-toolkit https://developer.ibm.com/storage/2018/09/30/offline-upgrade-using-install-toolkit/ IBM Spectrum Scale for Linux on IBM Z ? What?s new in IBM Spectrum Scale 5.0.2 ? https://developer.ibm.com/storage/2018/09/21/ibm-spectrum-scale-for-linux-on-ibm-z-whats-new-in-ibm-spectrum-scale-5-0-2/ What?s New in IBM Spectrum Scale 5.0.2 ? https://developer.ibm.com/storage/2018/09/15/whats-new-ibm-spectrum-scale-5-0-2/ Starting IBM Spectrum Scale 5.0.2 release, the installation toolkit supports upgrade rerun if fresh upgrade fails. https://developer.ibm.com/storage/2018/09/15/starting-ibm-spectrum-scale-5-0-2-release-installation-toolkit-supports-upgrade-rerun-fresh-upgrade-fails/ IBM Spectrum Scale installation toolkit ? enhancements over releases ? 5.0.2.0 https://developer.ibm.com/storage/2018/09/15/ibm-spectrum-scale-installation-toolkit-enhancements-releases-5-0-2-0/ Announcing HDP 3.0 support with IBM Spectrum Scale https://developer.ibm.com/storage/2018/08/31/announcing-hdp-3-0-support-ibm-spectrum-scale/ IBM Spectrum Scale Tuning Overview for Hadoop Workload https://developer.ibm.com/storage/2018/08/20/ibm-spectrum-scale-tuning-overview-hadoop-workload/ Making the Most of Multicloud Storage https://developer.ibm.com/storage/2018/08/13/making-multicloud-storage/ Disaster Recovery for Transparent Cloud Tiering using SOBAR https://developer.ibm.com/storage/2018/08/13/disaster-recovery-transparent-cloud-tiering-using-sobar/ Your Optimal Choice of AI Storage for Today and Tomorrow https://developer.ibm.com/storage/2018/08/10/spectrum-scale-ai-workloads/ Analyze IBM Spectrum Scale File Access Audit with ELK Stack https://developer.ibm.com/storage/2018/07/30/analyze-ibm-spectrum-scale-file-access-audit-elk-stack/ Mellanox SX1710 40G switch MLAG configuration for IBM ESS https://developer.ibm.com/storage/2018/07/12/mellanox-sx1710-40g-switcher-mlag-configuration/ Protocol Problem Determination Guide for IBM Spectrum Scale? ? SMB and NFS Access issues https://developer.ibm.com/storage/2018/07/10/protocol-problem-determination-guide-ibm-spectrum-scale-smb-nfs-access-issues/ Access Control in IBM Spectrum Scale Object https://developer.ibm.com/storage/2018/07/06/access-control-ibm-spectrum-scale-object/ IBM Spectrum Scale HDFS Transparency Docker support https://developer.ibm.com/storage/2018/07/06/ibm-spectrum-scale-hdfs-transparency-docker-support/ Protocol Problem Determination Guide for IBM Spectrum Scale? ? Log Collection https://developer.ibm.com/storage/2018/07/04/protocol-problem-determination-guide-ibm-spectrum-scale-log-collection/ Redpapers IBM Spectrum Scale Immutability Introduction, Configuration Guidance, and Use Cases http://www.redbooks.ibm.com/abstracts/redp5507.html?Open Certifications Assessment of the immutability function of IBM Spectrum Scale Version 5.0 in accordance to US SEC17a-4f, EU GDPR Article 21 Section 1, German and Swiss laws and regulations in collaboration with KPMG. Certificate: http://www.kpmg.de/bescheinigungen/RequestReport.aspx?DE968667B47544FF83F6CCDCF37E5FB5 Full assessment report: http://www.kpmg.de/bescheinigungen/RequestReport.aspx?B290411BE1224F5A9B4D24663BCD3C5D For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Date: 07/03/2018 12:13 AM Subject: Re: Latest Technical Blogs on Spectrum Scale (Q2 2018) Dear User Group Members, In continuation , here are list of development blogs in the this quarter (Q2 2018). We now have over 100+ developer blogs. As discussed in User Groups, passing it along: IBM Spectrum Scale 5.0.1 ? Whats new in Unified File and Object https://developer.ibm.com/storage/2018/06/15/6494/ IBM Spectrum Scale ILM Policies https://developer.ibm.com/storage/2018/06/02/ibm-spectrum-scale-ilm-policies/ IBM Spectrum Scale 5.0.1 ? Whats new in Unified File and Object https://developer.ibm.com/storage/2018/06/15/6494/ Management GUI enhancements in IBM Spectrum Scale release 5.0.1 https://developer.ibm.com/storage/2018/05/18/management-gui-enhancements-in-ibm-spectrum-scale-release-5-0-1/ Managing IBM Spectrum Scale services through GUI https://developer.ibm.com/storage/2018/05/18/managing-ibm-spectrum-scale-services-through-gui/ Use AWS CLI with IBM Spectrum Scale? object storage https://developer.ibm.com/storage/2018/05/16/use-awscli-with-ibm-spectrum-scale-object-storage/ Hadoop Storage Tiering with IBM Spectrum Scale https://developer.ibm.com/storage/2018/05/09/hadoop-storage-tiering-ibm-spectrum-scale/ How many Files on my Filesystem? https://developer.ibm.com/storage/2018/05/07/many-files-filesystem/ Recording Spectrum Scale Object Stats for Potential Billing like Purpose using Elasticsearch https://developer.ibm.com/storage/2018/05/04/spectrum-scale-object-stats-for-billing-using-elasticsearch/ New features in IBM Elastic Storage Server (ESS) Version 5.3 https://developer.ibm.com/storage/2018/04/09/new-features-ibm-elastic-storage-server-ess-version-5-3/ Using IBM Spectrum Scale for storage in IBM Cloud Private (Missed to send earlier) https://medium.com/ibm-cloud/ibm-spectrum-scale-with-ibm-cloud-private-8bf801796f19 Redpapers Hortonworks Data Platform with IBM Spectrum Scale: Reference Guide for Building an Integrated Solution http://www.redbooks.ibm.com/redpieces/abstracts/redp5448.html, Enabling Hybrid Cloud Storage for IBM Spectrum Scale Using Transparent Cloud Tiering http://www.redbooks.ibm.com/abstracts/redp5411.html?Open SAP HANA and ESS: A Winning Combination (Update) http://www.redbooks.ibm.com/abstracts/redp5436.html?Open Others IBM Spectrum Scale Software Version Recommendation Preventive Service Planning (Updated) http://www-01.ibm.com/support/docview.wss?uid=ssg1S1009703, IDC Infobrief: A Modular Approach to Genomics Infrastructure at Scale in HCLS https://www.ibm.com/common/ssi/cgi-bin/ssialias?htmlfid=37016937USEN& For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Date: 03/27/2018 05:23 PM Subject: Re: Latest Technical Blogs on Spectrum Scale Dear User Group Members, In continuation , here are list of development blogs in the this quarter (Q1 2018). As discussed in User Groups, passing it along: GDPR Compliance and Unstructured Data Storage https://developer.ibm.com/storage/2018/03/27/gdpr-compliance-unstructure-data-storage/ IBM Spectrum Scale for Linux on IBM Z ? Release 5.0 features and highlights https://developer.ibm.com/storage/2018/03/09/ibm-spectrum-scale-linux-ibm-z-release-5-0-features-highlights/ Management GUI enhancements in IBM Spectrum Scale release 5.0.0 https://developer.ibm.com/storage/2018/01/18/gui-enhancements-in-spectrum-scale-release-5-0-0/ IBM Spectrum Scale 5.0.0 ? What?s new in NFS? https://developer.ibm.com/storage/2018/01/18/ibm-spectrum-scale-5-0-0-whats-new-nfs/ Benefits and implementation of Spectrum Scale sudo wrappers https://developer.ibm.com/storage/2018/01/15/benefits-implementation-spectrum-scale-sudo-wrappers/ IBM Spectrum Scale: Big Data and Analytics Solution Brief https://developer.ibm.com/storage/2018/01/15/ibm-spectrum-scale-big-data-analytics-solution-brief/ Variant Sub-blocks in Spectrum Scale 5.0 https://developer.ibm.com/storage/2018/01/11/spectrum-scale-variant-sub-blocks/ Compression support in Spectrum Scale 5.0.0 https://developer.ibm.com/storage/2018/01/11/compression-support-spectrum-scale-5-0-0/ IBM Spectrum Scale Versus Apache Hadoop HDFS https://developer.ibm.com/storage/2018/01/10/spectrumscale_vs_hdfs/ ESS Fault Tolerance https://developer.ibm.com/storage/2018/01/09/ess-fault-tolerance/ Genomic Workloads ? How To Get it Right From Infrastructure Point Of View. https://developer.ibm.com/storage/2018/01/06/genomic-workloads-get-right-infrastructure-point-view/ IBM Spectrum Scale On AWS Cloud : This video explains how to deploy IBM Spectrum Scale on AWS. This solution helps the users who require highly available access to a shared name space across multiple instances with good performance, without requiring an in-depth knowledge of IBM Spectrum Scale. Detailed Demo : https://www.youtube.com/watch?v=6j5Xj_d0bh4 Brief Demo : https://www.youtube.com/watch?v=-aMQKPW_RfY. For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Cc: Doris Conti/Poughkeepsie/IBM at IBMUS Date: 01/10/2018 12:13 PM Subject: Re: Latest Technical Blogs on Spectrum Scale Dear User Group Members, Here are list of development blogs in the last quarter. Passing it to this email group as Doris had got a feedback in the UG meetings to notify the members with the latest updates periodically. Genomic Workloads ? How To Get it Right From Infrastructure Point Of View. https://developer.ibm.com/storage/2018/01/06/genomic-workloads-get-right-infrastructure-point-view/ IBM Spectrum Scale Versus Apache Hadoop HDFS https://developer.ibm.com/storage/2018/01/10/spectrumscale_vs_hdfs/ ESS Fault Tolerance https://developer.ibm.com/storage/2018/01/09/ess-fault-tolerance/ IBM Spectrum Scale MMFSCK ? Savvy Enhancements https://developer.ibm.com/storage/2018/01/05/ibm-spectrum-scale-mmfsck-savvy-enhancements/ ESS Disk Management https://developer.ibm.com/storage/2018/01/02/ess-disk-management/ IBM Spectrum Scale Object Protocol On Ubuntu https://developer.ibm.com/storage/2018/01/01/ibm-spectrum-scale-object-protocol-ubuntu/ IBM Spectrum Scale 5.0 ? Whats new in Unified File and Object https://developer.ibm.com/storage/2017/12/20/ibm-spectrum-scale-5-0-whats-new-object/ A Complete Guide to ? Protocol Problem Determination Guide for IBM Spectrum Scale? ? Part 1 https://developer.ibm.com/storage/2017/12/19/complete-guide-protocol-problem-determination-guide-ibm-spectrum-scale-1/ IBM Spectrum Scale installation toolkit ? enhancements over releases https://developer.ibm.com/storage/2017/12/15/ibm-spectrum-scale-installation-toolkit-enhancements-releases/ Network requirements in an Elastic Storage Server Setup https://developer.ibm.com/storage/2017/12/13/network-requirements-in-an-elastic-storage-server-setup/ Co-resident migration with Transparent cloud tierin https://developer.ibm.com/storage/2017/12/05/co-resident-migration-transparent-cloud-tierin/ IBM Spectrum Scale on Hortonworks HDP Hadoop clusters : A Complete Big Data Solution https://developer.ibm.com/storage/2017/12/05/ibm-spectrum-scale-hortonworks-hdp-hadoop-clusters-complete-big-data-solution/ Big data analytics with Spectrum Scale using remote cluster mount & multi-filesystem support https://developer.ibm.com/storage/2017/11/28/big-data-analytics-spectrum-scale-using-remote-cluster-mount-multi-filesystem-support/ IBM Spectrum Scale HDFS Transparency Short Circuit Write Support https://developer.ibm.com/storage/2017/11/28/ibm-spectrum-scale-hdfs-transparency-short-circuit-write-support/ IBM Spectrum Scale HDFS Transparency Federation Support https://developer.ibm.com/storage/2017/11/27/ibm-spectrum-scale-hdfs-transparency-federation-support/ How to configure and performance tuning different system workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-different-system-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ How to configure and performance tuning Spark workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-spark-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ How to configure and performance tuning database workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-database-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ How to configure and performance tuning Hadoop workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/24/configure-performance-tuning-hadoop-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ IBM Spectrum Scale Sharing Nothing Cluster Performance Tuning https://developer.ibm.com/storage/2017/11/24/ibm-spectrum-scale-sharing-nothing-cluster-performance-tuning/ How to Configure IBM Spectrum Scale? with NIS based Authentication. https://developer.ibm.com/storage/2017/11/21/configure-ibm-spectrum-scale-nis-based-authentication/ For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Cc: Doris Conti/Poughkeepsie/IBM at IBMUS Date: 11/16/2017 08:15 PM Subject: Latest Technical Blogs on Spectrum Scale Dear User Group members, Here are the Development Blogs in last 3 months on Spectrum Scale Technical Topics. Spectrum Scale Monitoring ? Know More ? https://developer.ibm.com/storage/2017/11/16/spectrum-scale-monitoring-know/ IBM Spectrum Scale 5.0 Release ? What?s coming ! https://developer.ibm.com/storage/2017/11/14/ibm-spectrum-scale-5-0-release-whats-coming/ Four Essentials things to know for managing data ACLs on IBM Spectrum Scale? from Windows https://developer.ibm.com/storage/2017/11/13/four-essentials-things-know-managing-data-acls-ibm-spectrum-scale-windows/ GSSUTILS: A new way of running SSR, Deploying or Upgrading ESS Server https://developer.ibm.com/storage/2017/11/13/gssutils/ IBM Spectrum Scale Object Authentication https://developer.ibm.com/storage/2017/11/02/spectrum-scale-object-authentication/ Video Surveillance ? Choosing the right storage https://developer.ibm.com/storage/2017/11/02/video-surveillance-choosing-right-storage/ IBM Spectrum scale object deep dive training with problem determination https://www.slideshare.net/SmitaRaut/ibm-spectrum-scale-object-deep-dive-training Spectrum Scale as preferred software defined storage for Ubuntu OpenStack https://developer.ibm.com/storage/2017/09/29/spectrum-scale-preferred-software-defined-storage-ubuntu-openstack/ IBM Elastic Storage Server 2U24 Storage ? an All-Flash offering, a performance workhorse https://developer.ibm.com/storage/2017/10/06/ess-5-2-flash-storage/ A Complete Guide to Configure LDAP-based authentication with IBM Spectrum Scale? for File Access https://developer.ibm.com/storage/2017/09/21/complete-guide-configure-ldap-based-authentication-ibm-spectrum-scale-file-access/ Deploying IBM Spectrum Scale on AWS Quick Start https://developer.ibm.com/storage/2017/09/18/deploy-ibm-spectrum-scale-on-aws-quick-start/ Monitoring Spectrum Scale Object metrics https://developer.ibm.com/storage/2017/09/14/monitoring-spectrum-scale-object-metrics/ Tier your data with ease to Spectrum Scale Private Cloud(s) using Moonwalk Universal https://developer.ibm.com/storage/2017/09/14/tier-data-ease-spectrum-scale-private-clouds-using-moonwalk-universal/ Why do I see owner as ?Nobody? for my export mounted using NFSV4 Protocol on IBM Spectrum Scale?? https://developer.ibm.com/storage/2017/09/08/see-owner-nobody-export-mounted-using-nfsv4-protocol-ibm-spectrum-scale/ IBM Spectrum Scale? Authentication using Active Directory and LDAP https://developer.ibm.com/storage/2017/09/01/ibm-spectrum-scale-authentication-using-active-directory-ldap/ IBM Spectrum Scale? Authentication using Active Directory and RFC2307 https://developer.ibm.com/storage/2017/09/01/ibm-spectrum-scale-authentication-using-active-directory-rfc2307/ High Availability Implementation with IBM Spectrum Virtualize and IBM Spectrum Scale https://developer.ibm.com/storage/2017/08/30/high-availability-implementation-ibm-spectrum-virtualize-ibm-spectrum-scale/ 10 Frequently asked Questions on configuring Authentication using AD + AUTO ID mapping on IBM Spectrum Scale?. https://developer.ibm.com/storage/2017/08/04/10-frequently-asked-questions-configuring-authentication-using-ad-auto-id-mapping-ibm-spectrum-scale/ IBM Spectrum Scale? Authentication using Active Directory https://developer.ibm.com/storage/2017/07/30/ibm-spectrum-scale-auth-using-active-directory/ Five cool things that you didn?t know Transparent Cloud Tiering on Spectrum Scale can do https://developer.ibm.com/storage/2017/07/29/five-cool-things-didnt-know-transparent-cloud-tiering-spectrum-scale-can/ IBM Spectrum Scale GUI videos https://developer.ibm.com/storage/2017/07/25/ibm-spectrum-scale-gui-videos/ IBM Spectrum Scale? Authentication ? Planning for NFS Access https://developer.ibm.com/storage/2017/07/24/ibm-spectrum-scale-planning-nfs-access/ For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media -------------- next part -------------- An HTML attachment was scrubbed... URL: From cabrillo at ifca.unican.es Tue Jan 15 10:49:58 2019 From: cabrillo at ifca.unican.es (Iban Cabrillo) Date: Tue, 15 Jan 2019 11:49:58 +0100 (CET) Subject: [gpfsug-discuss] Monitoring non gpfs cluster nodes using pmsensors Message-ID: <1730394866.8701339.1547549398355.JavaMail.zimbra@ifca.unican.es> Dear, The gpfsgui dashboard show us most part of relevant information for cluster management. Avoiding to install other plot utilities (like graphana for example), we want to explore the possibility to use this packages to harvest and plot this information, in order to centralize the graph management in one only place. We see this information arrives to the gpfsgui node (from non gpfs cluster nodes), but we can't show the plots. Is there any way to use the pmsensor and pmcollector packages to monitorice / plot non gpfs cluster nodes using the gpfsgui dashboard ? Regards, I -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon Jan 14 15:02:07 2019 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 14 Jan 2019 15:02:07 +0000 Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? In-Reply-To: References: Message-ID: <5DCBA252-33B7-4AC9-B2E3-10DA6881B1AA@vanderbilt.edu> Hi Scott and Valdis (and everyone else), Thanks for your responses. Yes, we _could_ easily build a local naming scheme ? the name of the fileset matches the name of a folder in one of a couple of parent directories. However, an earlier response to my post asked if we?d be willing to share our script with the community and we would ? _if_ we can make it generic enough to be useful. Local naming schemes hardcoded in the script make it much less generically useful. Plus, it just seems to me that there ought to be a way to do this ? to get a list of fileset names from mmlsquota and then programmatically determine their junction path without having root privileges. GPFS has got to be storing that information somewhere, and I?m frankly quite surprised that no IBMer has responded with an answer to that. But I also know that when IBM is silent, there?s typically a reason. And yes, we could regularly create a static file ? in fact, that?s what we do now once per day (in the early morning hours). While this is not a huge deal - we only create / delete filesets a handful of times per month - on the day we do the script won?t function properly unless we manually update the file. I?m wanting to eliminate that, if possible ? which as I stated in the preceding paragraph, I have a hard time believing is not possible. I did look at the list of callbacks again (good thought!) and there?s not one specifically related to the creation / deletion of a fileset. There was only one that I saw that I think could even possibly be of use ? ccrFileChange. Can anyone on the list confirm or deny that the creation / deletion of a fileset would cause that callback to be triggered?? If it is triggered, then we could use that to update the static filesets within a minute or two of the change being made, which would definitely be acceptable. I realize that many things likely trigger a ccrFileChange, so I?m thinking of having a callback script that checks the current list of filesets against the static file and updates that appropriately. Thanks again for the responses? Kevin > On Jan 13, 2019, at 10:09 PM, Scott Goldman wrote: > > Kevin, > Something I've done in the past is to create a service that once an hour/day/week that would build a static file that consists of the needed output. > > As long as you can take the update delay (or perhaps trigger the update with a callback), this should work and could actually be lighter on the system. > > Sent from my BlackBerry - the most secure mobile device > > Original Message > From: valdis.kletnieks at vt.edu > Sent: January 12, 2019 4:07 PM > To: gpfsug-discuss at spectrumscale.org > Reply-to: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? > > On Sat, 12 Jan 2019 03:07:29 +0000, "Buterbaugh, Kevin L" said: >> But from there I need to then be able to find out where that fileset is >> mounted in the directory tree so that I can see who the owner and group of that >> directory are. > > You're not able to leverage a local naming scheme? There's no connection between > the name of the fileset and where it is in the tree? I would hope there is, because > otherwise when your tool says 'Fileset ag5eg19 is over quota', your poor user will > now be confused over what director(y/ies) need to be cleaned up. If your tool > says 'Fileset foo_bar_baz is over quota' and the user knows that's mounted at > /gpfs/foo/bar/baz then it's actionable. > > And if the user knows what the mapping is, your script can know it too.... > From Kevin.Buterbaugh at Vanderbilt.Edu Mon Jan 14 15:02:07 2019 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 14 Jan 2019 15:02:07 +0000 Subject: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? In-Reply-To: References: Message-ID: <5DCBA252-33B7-4AC9-B2E3-10DA6881B1AA@vanderbilt.edu> Hi Scott and Valdis (and everyone else), Thanks for your responses. Yes, we _could_ easily build a local naming scheme ? the name of the fileset matches the name of a folder in one of a couple of parent directories. However, an earlier response to my post asked if we?d be willing to share our script with the community and we would ? _if_ we can make it generic enough to be useful. Local naming schemes hardcoded in the script make it much less generically useful. Plus, it just seems to me that there ought to be a way to do this ? to get a list of fileset names from mmlsquota and then programmatically determine their junction path without having root privileges. GPFS has got to be storing that information somewhere, and I?m frankly quite surprised that no IBMer has responded with an answer to that. But I also know that when IBM is silent, there?s typically a reason. And yes, we could regularly create a static file ? in fact, that?s what we do now once per day (in the early morning hours). While this is not a huge deal - we only create / delete filesets a handful of times per month - on the day we do the script won?t function properly unless we manually update the file. I?m wanting to eliminate that, if possible ? which as I stated in the preceding paragraph, I have a hard time believing is not possible. I did look at the list of callbacks again (good thought!) and there?s not one specifically related to the creation / deletion of a fileset. There was only one that I saw that I think could even possibly be of use ? ccrFileChange. Can anyone on the list confirm or deny that the creation / deletion of a fileset would cause that callback to be triggered?? If it is triggered, then we could use that to update the static filesets within a minute or two of the change being made, which would definitely be acceptable. I realize that many things likely trigger a ccrFileChange, so I?m thinking of having a callback script that checks the current list of filesets against the static file and updates that appropriately. Thanks again for the responses? Kevin > On Jan 13, 2019, at 10:09 PM, Scott Goldman wrote: > > Kevin, > Something I've done in the past is to create a service that once an hour/day/week that would build a static file that consists of the needed output. > > As long as you can take the update delay (or perhaps trigger the update with a callback), this should work and could actually be lighter on the system. > > Sent from my BlackBerry - the most secure mobile device > > Original Message > From: valdis.kletnieks at vt.edu > Sent: January 12, 2019 4:07 PM > To: gpfsug-discuss at spectrumscale.org > Reply-to: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] Get list of filesets _without_ running mmlsfileset? > > On Sat, 12 Jan 2019 03:07:29 +0000, "Buterbaugh, Kevin L" said: >> But from there I need to then be able to find out where that fileset is >> mounted in the directory tree so that I can see who the owner and group of that >> directory are. > > You're not able to leverage a local naming scheme? There's no connection between > the name of the fileset and where it is in the tree? I would hope there is, because > otherwise when your tool says 'Fileset ag5eg19 is over quota', your poor user will > now be confused over what director(y/ies) need to be cleaned up. If your tool > says 'Fileset foo_bar_baz is over quota' and the user knows that's mounted at > /gpfs/foo/bar/baz then it's actionable. > > And if the user knows what the mapping is, your script can know it too.... > From makaplan at us.ibm.com Tue Jan 15 14:46:18 2019 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 15 Jan 2019 11:46:18 -0300 Subject: [gpfsug-discuss] Get list of filesets_without_runningmmlsfileset? In-Reply-To: <5DCBA252-33B7-4AC9-B2E3-10DA6881B1AA@vanderbilt.edu> References: <5DCBA252-33B7-4AC9-B2E3-10DA6881B1AA@vanderbilt.edu> Message-ID: Personally, I agree that there ought to be a way in the product. In the meawhile, you no doubt already have some ways to tell your users where to find their filesets as pathnames. Otherwise, how are they accessing their files? And to keep things somewhat sane, I'd bet filesets are all linked to one or small number of well known paths in the filesystem. Like /AGpfsFilesystem/filesets/... Plus you could add symlinks and/or as has been suggested post info extracted from mmlsfileset and/or mmlsquota. So as a practical matter, is this an urgent problem...? Why? How? -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Tue Jan 15 15:11:41 2019 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Tue, 15 Jan 2019 15:11:41 +0000 Subject: [gpfsug-discuss] Get list of filesets_without_runningmmlsfileset? In-Reply-To: References: <5DCBA252-33B7-4AC9-B2E3-10DA6881B1AA@vanderbilt.edu> Message-ID: <0D5558D9-9003-4B95-9A37-42321E03114D@vanderbilt.edu> Hi Marc (All), Yes, I can easily determine where filesets are linked here ? it is, as you said, in just one or two paths. The script as it stands now has been doing that for several years and only needs a couple of relatively minor tweaks to be even more useful to _us_ by whittling down a couple of edge cases relating to fileset creation / deletion. However ? there was a request to share the script with the broader community ? something I?m willing to do if I can get it in a state where it would be useful to others with little or no modification. Anybody who?s been on this list for any length of time knows how much help I?ve received from the community over the years. I truly appreciate that and would like to give back, even in a minor way, if possible. But in order to do that the script can?t be full of local assumptions ? that?s it in a nutshell ? that?s why I want to programmatically determine the junction path at run time as a non-root user. I?ll also mention here that early on in this thread Simon Thompson suggested looking into the REST API. Sure enough, you can get the information that way ? but, AFAICT, that would require the script to contain a username / password combination that would allow anyone with access to the script to then use that authentication information to access other information within GPFS that we probably don?t want them to have access to. If I?m mistaken about that, then please feel free to enlighten me. Thanks again? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Jan 15, 2019, at 8:46 AM, Marc A Kaplan > wrote: Personally, I agree that there ought to be a way in the product. In the meawhile, you no doubt already have some ways to tell your users where to find their filesets as pathnames. Otherwise, how are they accessing their files? And to keep things somewhat sane, I'd bet filesets are all linked to one or small number of well known paths in the filesystem. Like /AGpfsFilesystem/filesets/... Plus you could add symlinks and/or as has been suggested post info extracted from mmlsfileset and/or mmlsquota. So as a practical matter, is this an urgent problem...? Why? How? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cbd2c28fdb60041f3434e08d67af83b11%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636831603904557717&sdata=A74TTq%2FQvyhEMHaolklbiMAEnaGVuHNiyhVYfn4wRek%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rohwedder at de.ibm.com Tue Jan 15 15:36:39 2019 From: rohwedder at de.ibm.com (Markus Rohwedder) Date: Tue, 15 Jan 2019 16:36:39 +0100 Subject: [gpfsug-discuss] Monitoring non gpfs cluster nodes using pmsensors In-Reply-To: <1730394866.8701339.1547549398355.JavaMail.zimbra@ifca.unican.es> References: <1730394866.8701339.1547549398355.JavaMail.zimbra@ifca.unican.es> Message-ID: Hello Iban, the pmsensor and pmcollector packages together with the GUI dashboard and statistics pages are not designed to be a general monitoring solution. For example. in many places we are filtering for GPFS nodes that are known to be cluster members and we try to match host names to GPFS node names. This causes the lack of nodes in GUI charts you are experiencing. In addition. the CLI based setup and management of the sensors assume that sensor nodes are cluster nodes. We are not intending to open up the internal management and views for data outside the cluster in the futute.- The requirements to provide plotting, filtering, aggregation and calculation in a general plotting environment can be very diverse and we may not be able to handle this. So while we are flattered by the request to use our charting capabilities as a general solution, we propose to use tools like grafana as more general solution. Please note that the GUI charts and dashboards have URLs that allow them to be hyperlinked, so you could also combine other web based charting tools together with the GUI based charts. Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development Phone: +49 7034 6430190 IBM Deutschland Research & Development E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany From: Iban Cabrillo To: gpfsug-discuss Date: 15.01.2019 12:05 Subject: [gpfsug-discuss] Monitoring non gpfs cluster nodes using pmsensors Sent by: gpfsug-discuss-bounces at spectrumscale.org Dear, The gpfsgui dashboard show us most part of relevant information for cluster management. Avoiding to install other plot utilities (like graphana for example), we want to explore the possibility to use this packages to harvest and plot this information, in order to centralize the graph management in one only place. We see this information arrives to the gpfsgui node (from non gpfs cluster nodes), but we can't show the plots. Is there any way to use the pmsensor and pmcollector packages to monitorice / plot non gpfs cluster nodes using the gpfsgui dashboard ? Regards, I _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1D690169.gif Type: image/gif Size: 4659 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Tue Jan 15 15:57:39 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 15 Jan 2019 15:57:39 +0000 Subject: [gpfsug-discuss] Monitoring non gpfs cluster nodes using pmsensors Message-ID: Understand that you don?t want to install Grafana on its own, but there is a GPFS Grafana bridge I believe that would allow you to include the GPFS collected data in a Grafana dashboard. So if not wanting to setup sensors for that data is the reason you don?t want Grafana, then using the bridge might pull the data you want? Simon From: on behalf of "cabrillo at ifca.unican.es" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Tuesday, 15 January 2019 at 11:05 To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Monitoring non gpfs cluster nodes using pmsensors Dear, The gpfsgui dashboard show us most part of relevant information for cluster management. Avoiding to install other plot utilities (like graphana for example), we want to explore the possibility to use this packages to harvest and plot this information, in order to centralize the graph management in one only place. We see this information arrives to the gpfsgui node (from non gpfs cluster nodes), but we can't show the plots. Is there any way to use the pmsensor and pmcollector packages to monitorice / plot non gpfs cluster nodes using the gpfsgui dashboard ? Regards, I -------------- next part -------------- An HTML attachment was scrubbed... URL: From A.Wolf-Reber at de.ibm.com Wed Jan 16 08:16:58 2019 From: A.Wolf-Reber at de.ibm.com (Alexander Wolf) Date: Wed, 16 Jan 2019 08:16:58 +0000 Subject: [gpfsug-discuss] Get list offilesets_without_runningmmlsfileset? In-Reply-To: References: , <5DCBA252-33B7-4AC9-B2E3-10DA6881B1AA@vanderbilt.edu> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15475476039319.png Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.154754760393110.png Type: image/png Size: 6645 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.154754760393111.png Type: image/png Size: 1134 bytes Desc: not available URL: From makaplan at us.ibm.com Wed Jan 16 12:57:18 2019 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Wed, 16 Jan 2019 09:57:18 -0300 Subject: [gpfsug-discuss] Get fileset and other info via Rest API and/or GUI In-Reply-To: References: , <5DCBA252-33B7-4AC9-B2E3-10DA6881B1AA@vanderbilt.edu> Message-ID: Good to know the "Rest" does it for us. Since I started working on GPFS internals and CLI utitlities around Release 3.x, I confess I never had need of the GUI or the Rest API server. In fact I do most of my work remotely via Putty/Xterm/Emacs and only once-in-a-while even have an XWindows or VNC server/view of a GPFS node! So consider any of my remarks in that context. So I certainly defer to others when it comes to Spectrum Scale GUIs, "Protocol" servers and such. If I'm missing anything great, perhaps some kind soul will send me a note offline from this public forum. --Marc.K of GPFS -------------- next part -------------- An HTML attachment was scrubbed... URL: From spectrumscale at kiranghag.com Wed Jan 16 16:18:16 2019 From: spectrumscale at kiranghag.com (KG) Date: Wed, 16 Jan 2019 21:48:16 +0530 Subject: [gpfsug-discuss] Filesystem automount issues Message-ID: Hi IHAC running Scale 5.x on RHEL 7.5 One out of two filesystems (/home) does not get mounted automatically at boot. (/home is scale filesystem) The scale log does mention that the filesystem is mounted but mount output says otherwise. There are no entries for /home in fstab since we let scale mount it. Automount on scale and filesystem both have been set to yes. Any pointers to troubleshoot would be appreciated. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Wed Jan 16 16:33:25 2019 From: stockf at us.ibm.com (Frederick Stock) Date: Wed, 16 Jan 2019 11:33:25 -0500 Subject: [gpfsug-discuss] Filesystem automount issues In-Reply-To: References: Message-ID: What does the output of "mmlsmount all -L" show? Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: KG To: gpfsug main discussion list Date: 01/16/2019 11:19 AM Subject: [gpfsug-discuss] Filesystem automount issues Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi IHAC running Scale 5.x on RHEL 7.5 One out of two filesystems (/home) does not get mounted automatically at boot. (/home is scale filesystem) The scale log does mention that the filesystem is mounted but mount output says otherwise. There are no entries for /home in fstab since we let scale mount it. Automount on scale and filesystem both have been set to yes. Any pointers to troubleshoot would be appreciated. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From spectrumscale at kiranghag.com Wed Jan 16 18:14:39 2019 From: spectrumscale at kiranghag.com (KG) Date: Wed, 16 Jan 2019 23:44:39 +0530 Subject: [gpfsug-discuss] Filesystem automount issues In-Reply-To: References: Message-ID: It shows that the filesystem is not mounted On Wed, Jan 16, 2019, 22:03 Frederick Stock What does the output of "mmlsmount all -L" show? > > Fred > __________________________________________________ > Fred Stock | IBM Pittsburgh Lab | 720-430-8821 > stockf at us.ibm.com > > > > From: KG > To: gpfsug main discussion list > Date: 01/16/2019 11:19 AM > Subject: [gpfsug-discuss] Filesystem automount issues > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi > > IHAC running Scale 5.x on RHEL 7.5 > > One out of two filesystems (/home) does not get mounted automatically at > boot. (/home is scale filesystem) > > The scale log does mention that the filesystem is mounted but mount output > says otherwise. > > There are no entries for /home in fstab since we let scale mount it. > Automount on scale and filesystem both have been set to yes. > > Any pointers to troubleshoot would be appreciated. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Wed Jan 16 18:38:07 2019 From: stockf at us.ibm.com (Frederick Stock) Date: Wed, 16 Jan 2019 13:38:07 -0500 Subject: [gpfsug-discuss] Filesystem automount issues In-Reply-To: References: Message-ID: Would it be possible for you to include the output of "mmlsmount all -L" and "df -k" in your response? Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: KG To: gpfsug main discussion list Date: 01/16/2019 01:15 PM Subject: Re: [gpfsug-discuss] Filesystem automount issues Sent by: gpfsug-discuss-bounces at spectrumscale.org It shows that the filesystem is not mounted On Wed, Jan 16, 2019, 22:03 Frederick Stock To: gpfsug main discussion list Date: 01/16/2019 11:19 AM Subject: [gpfsug-discuss] Filesystem automount issues Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi IHAC running Scale 5.x on RHEL 7.5 One out of two filesystems (/home) does not get mounted automatically at boot. (/home is scale filesystem) The scale log does mention that the filesystem is mounted but mount output says otherwise. There are no entries for /home in fstab since we let scale mount it. Automount on scale and filesystem both have been set to yes. Any pointers to troubleshoot would be appreciated. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Wed Jan 16 20:01:53 2019 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Wed, 16 Jan 2019 21:01:53 +0100 Subject: [gpfsug-discuss] Filesystem automount issues In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Jan 17 11:35:13 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 17 Jan 2019 11:35:13 +0000 Subject: [gpfsug-discuss] Node expels Message-ID: We?ve recently been seeing quite a few node expels with messages of the form: 2019-01-17_11:19:30.882+0000: [W] The TCP connection to IP address 10.20.0.58 proto-pg-pf01.bear.cluster (socket 153) state is unexpected: state=1 ca_state=4 snd_cwnd=1 snd_ssthresh=5 unacked=5 probes=0 backoff=7 retransmits=7 rto=26496000 rcv_ssthresh=102828 rtt=6729 rttvar=12066 sacked=0 retrans=1 reordering=3 lost=5 2019-01-17_11:19:30.882+0000: [I] tscCheckTcpConn: Sending debug data collection request to node 10.20.0.58 proto-pg-pf01.bear.cluster 2019-01-17_11:19:30.882+0000: Sending request to collect TCP debug data to proto-pg-pf01.bear.cluster localNode 2019-01-17_11:19:30.882+0000: [I] Calling user exit script gpfsSendRequestToNodes: event sendRequestToNodes, Async command /usr/lpp/mmfs/bin/mmcommon. 2019-01-17_11:24:52.611+0000: [E] Timed out in 300 seconds waiting for a commMsgCheckMessages reply from node 10.20.0.58 proto-pg-pf01.bear.cluster. Sending expel message. On the client node, we see messages of the form: 2019-01-17_11:19:31.101+0000: [N] sdrServ: Received Tcp data collection request from 10.10.0.33 2019-01-17_11:19:31.102+0000: [N] GPFS will attempt to collect Tcp debug data on this node. 2019-01-17_11:24:52.838+0000: [N] sdrServ: Received expel data collection request from 10.10.0.33 2019-01-17_11:24:52.838+0000: [N] GPFS will attempt to collect debug data on this node. 2019-01-17_11:25:02.741+0000: [N] This node will be expelled from cluster rds.gpfs.servers due to expel msg from 10.10.12.41 (b ber-les-nsd01-data.bb2.cluster in rds.gpfs.server 2019-01-17_11:25:03.160+0000: [N] sdrServ: Received expel data collection request from 10.20.0.56 They always appear to be to a specific type of hardware with the same Ethernet controller, though the nodes are split across three data centres and we aren?t seeing link congestion on the links between them. On the node I listed above, it?s not actually doing anything either as the software on it is still being installed (i.e. it?s not doing GPFS or any other IO other than a couple of home directories). Any suggestions on what ?(socket 153) state is unexpected? means? Thanks Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From TOMP at il.ibm.com Thu Jan 17 11:46:19 2019 From: TOMP at il.ibm.com (Tomer Perry) Date: Thu, 17 Jan 2019 13:46:19 +0200 Subject: [gpfsug-discuss] Node expels In-Reply-To: References: Message-ID: Simon, Take a look at http://files.gpfsug.org/presentations/2018/USA/Scale_Network_Flow-0.8.pdf slide 13. Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 17/01/2019 13:35 Subject: [gpfsug-discuss] Node expels Sent by: gpfsug-discuss-bounces at spectrumscale.org We?ve recently been seeing quite a few node expels with messages of the form: 2019-01-17_11:19:30.882+0000: [W] The TCP connection to IP address 10.20.0.58 proto-pg-pf01.bear.cluster (socket 153) state is unexpected: state=1 ca_state=4 snd_cwnd=1 snd_ssthresh=5 unacked=5 probes=0 backoff=7 retransmits=7 rto=26496000 rcv_ssthresh=102828 rtt=6729 rttvar=12066 sacked=0 retrans=1 reordering=3 lost=5 2019-01-17_11:19:30.882+0000: [I] tscCheckTcpConn: Sending debug data collection request to node 10.20.0.58 proto-pg-pf01.bear.cluster 2019-01-17_11:19:30.882+0000: Sending request to collect TCP debug data to proto-pg-pf01.bear.cluster localNode 2019-01-17_11:19:30.882+0000: [I] Calling user exit script gpfsSendRequestToNodes: event sendRequestToNodes, Async command /usr/lpp/mmfs/bin/mmcommon. 2019-01-17_11:24:52.611+0000: [E] Timed out in 300 seconds waiting for a commMsgCheckMessages reply from node 10.20.0.58 proto-pg-pf01.bear.cluster. Sending expel message. On the client node, we see messages of the form: 2019-01-17_11:19:31.101+0000: [N] sdrServ: Received Tcp data collection request from 10.10.0.33 2019-01-17_11:19:31.102+0000: [N] GPFS will attempt to collect Tcp debug data on this node. 2019-01-17_11:24:52.838+0000: [N] sdrServ: Received expel data collection request from 10.10.0.33 2019-01-17_11:24:52.838+0000: [N] GPFS will attempt to collect debug data on this node. 2019-01-17_11:25:02.741+0000: [N] This node will be expelled from cluster rds.gpfs.servers due to expel msg from 10.10.12.41 (b ber-les-nsd01-data.bb2.cluster in rds.gpfs.server 2019-01-17_11:25:03.160+0000: [N] sdrServ: Received expel data collection request from 10.20.0.56 They always appear to be to a specific type of hardware with the same Ethernet controller, though the nodes are split across three data centres and we aren?t seeing link congestion on the links between them. On the node I listed above, it?s not actually doing anything either as the software on it is still being installed (i.e. it?s not doing GPFS or any other IO other than a couple of home directories). Any suggestions on what ?(socket 153) state is unexpected? means? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From TOMP at il.ibm.com Thu Jan 17 13:28:15 2019 From: TOMP at il.ibm.com (Tomer Perry) Date: Thu, 17 Jan 2019 15:28:15 +0200 Subject: [gpfsug-discuss] Node expels In-Reply-To: References: Message-ID: Hi, I was asked to elaborate a bit ( thus also adding John and Yong Ze Chen). As written on the slide: One of the best ways to determine if a network layer problem is root cause for an expel is to look at the low-level socket details dumped in the ?extra? log data (mmfs dump all) saved as part of automatic data collection on Linux GPFS nodes. So, the idea is that in expel situation, we dump the socket state from the OS ( you can see the same using 'ss -i' for example). In your example, it shows that the ca_state is 4, there are retransmits, high rto and all the point to a network problem. You can find more details here: http://www.yonch.com/tech/linux-tcp-congestion-control-internals Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: "Tomer Perry" To: gpfsug main discussion list Date: 17/01/2019 13:46 Subject: Re: [gpfsug-discuss] Node expels Sent by: gpfsug-discuss-bounces at spectrumscale.org Simon, Take a look at http://files.gpfsug.org/presentations/2018/USA/Scale_Network_Flow-0.8.pdf slide 13. Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 17/01/2019 13:35 Subject: [gpfsug-discuss] Node expels Sent by: gpfsug-discuss-bounces at spectrumscale.org We?ve recently been seeing quite a few node expels with messages of the form: 2019-01-17_11:19:30.882+0000: [W] The TCP connection to IP address 10.20.0.58 proto-pg-pf01.bear.cluster (socket 153) state is unexpected: state=1 ca_state=4 snd_cwnd=1 snd_ssthresh=5 unacked=5 probes=0 backoff=7 retransmits=7 rto=26496000 rcv_ssthresh=102828 rtt=6729 rttvar=12066 sacked=0 retrans=1 reordering=3 lost=5 2019-01-17_11:19:30.882+0000: [I] tscCheckTcpConn: Sending debug data collection request to node 10.20.0.58 proto-pg-pf01.bear.cluster 2019-01-17_11:19:30.882+0000: Sending request to collect TCP debug data to proto-pg-pf01.bear.cluster localNode 2019-01-17_11:19:30.882+0000: [I] Calling user exit script gpfsSendRequestToNodes: event sendRequestToNodes, Async command /usr/lpp/mmfs/bin/mmcommon. 2019-01-17_11:24:52.611+0000: [E] Timed out in 300 seconds waiting for a commMsgCheckMessages reply from node 10.20.0.58 proto-pg-pf01.bear.cluster. Sending expel message. On the client node, we see messages of the form: 2019-01-17_11:19:31.101+0000: [N] sdrServ: Received Tcp data collection request from 10.10.0.33 2019-01-17_11:19:31.102+0000: [N] GPFS will attempt to collect Tcp debug data on this node. 2019-01-17_11:24:52.838+0000: [N] sdrServ: Received expel data collection request from 10.10.0.33 2019-01-17_11:24:52.838+0000: [N] GPFS will attempt to collect debug data on this node. 2019-01-17_11:25:02.741+0000: [N] This node will be expelled from cluster rds.gpfs.servers due to expel msg from 10.10.12.41 (b ber-les-nsd01-data.bb2.cluster in rds.gpfs.server 2019-01-17_11:25:03.160+0000: [N] sdrServ: Received expel data collection request from 10.20.0.56 They always appear to be to a specific type of hardware with the same Ethernet controller, though the nodes are split across three data centres and we aren?t seeing link congestion on the links between them. On the node I listed above, it?s not actually doing anything either as the software on it is still being installed (i.e. it?s not doing GPFS or any other IO other than a couple of home directories). Any suggestions on what ?(socket 153) state is unexpected? means? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jlewars at us.ibm.com Thu Jan 17 14:30:45 2019 From: jlewars at us.ibm.com (John Lewars) Date: Thu, 17 Jan 2019 09:30:45 -0500 Subject: [gpfsug-discuss] Node expels In-Reply-To: References: Message-ID: >They always appear to be to a specific type of hardware with the same Ethernet controller, That makes me think you might be seeing packet loss that could require ring buffer tuning (the defaults and limits will differ with different ethernet adapters). The expel section in the slides on this page has been expanded to include a 'debugging expels section' (slides 19-20, which also reference ring buffer tuning): https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/DEBUG%20Expels/comment/7e4f9433-7ca3-430f-b40b-94777c507381 Regards, John Lewars Spectrum Scale Performance, IBM Poughkeepsie From: Tomer Perry/Israel/IBM To: gpfsug main discussion list Cc: John Lewars/Poughkeepsie/IBM at IBMUS, Yong Ze Chen/China/IBM at IBMCN Date: 01/17/2019 08:28 AM Subject: Re: [gpfsug-discuss] Node expels Hi, I was asked to elaborate a bit ( thus also adding John and Yong Ze Chen). As written on the slide: One of the best ways to determine if a network layer problem is root cause for an expel is to look at the low-level socket details dumped in the ?extra? log data (mmfs dump all) saved as part of automatic data collection on Linux GPFS nodes. So, the idea is that in expel situation, we dump the socket state from the OS ( you can see the same using 'ss -i' for example). In your example, it shows that the ca_state is 4, there are retransmits, high rto and all the point to a network problem. You can find more details here: http://www.yonch.com/tech/linux-tcp-congestion-control-internals Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: "Tomer Perry" To: gpfsug main discussion list Date: 17/01/2019 13:46 Subject: Re: [gpfsug-discuss] Node expels Sent by: gpfsug-discuss-bounces at spectrumscale.org Simon, Take a look at http://files.gpfsug.org/presentations/2018/USA/Scale_Network_Flow-0.8.pdf slide 13. Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 17/01/2019 13:35 Subject: [gpfsug-discuss] Node expels Sent by: gpfsug-discuss-bounces at spectrumscale.org We?ve recently been seeing quite a few node expels with messages of the form: 2019-01-17_11:19:30.882+0000: [W] The TCP connection to IP address 10.20.0.58 proto-pg-pf01.bear.cluster (socket 153) state is unexpected: state=1 ca_state=4 snd_cwnd=1 snd_ssthresh=5 unacked=5 probes=0 backoff=7 retransmits=7 rto=26496000 rcv_ssthresh=102828 rtt=6729 rttvar=12066 sacked=0 retrans=1 reordering=3 lost=5 2019-01-17_11:19:30.882+0000: [I] tscCheckTcpConn: Sending debug data collection request to node 10.20.0.58 proto-pg-pf01.bear.cluster 2019-01-17_11:19:30.882+0000: Sending request to collect TCP debug data to proto-pg-pf01.bear.cluster localNode 2019-01-17_11:19:30.882+0000: [I] Calling user exit script gpfsSendRequestToNodes: event sendRequestToNodes, Async command /usr/lpp/mmfs/bin/mmcommon. 2019-01-17_11:24:52.611+0000: [E] Timed out in 300 seconds waiting for a commMsgCheckMessages reply from node 10.20.0.58 proto-pg-pf01.bear.cluster. Sending expel message. On the client node, we see messages of the form: 2019-01-17_11:19:31.101+0000: [N] sdrServ: Received Tcp data collection request from 10.10.0.33 2019-01-17_11:19:31.102+0000: [N] GPFS will attempt to collect Tcp debug data on this node. 2019-01-17_11:24:52.838+0000: [N] sdrServ: Received expel data collection request from 10.10.0.33 2019-01-17_11:24:52.838+0000: [N] GPFS will attempt to collect debug data on this node. 2019-01-17_11:25:02.741+0000: [N] This node will be expelled from cluster rds.gpfs.servers due to expel msg from 10.10.12.41 (b ber-les-nsd01-data.bb2.cluster in rds.gpfs.server 2019-01-17_11:25:03.160+0000: [N] sdrServ: Received expel data collection request from 10.20.0.56 They always appear to be to a specific type of hardware with the same Ethernet controller, though the nodes are split across three data centres and we aren?t seeing link congestion on the links between them. On the node I listed above, it?s not actually doing anything either as the software on it is still being installed (i.e. it?s not doing GPFS or any other IO other than a couple of home directories). Any suggestions on what ?(socket 153) state is unexpected? means? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Thu Jan 17 19:02:06 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 17 Jan 2019 19:02:06 +0000 Subject: [gpfsug-discuss] Node expels In-Reply-To: References: , Message-ID: So we've backed out a bunch of network tuning parameters we had set (based on the GPFS wiki pages), they've been set a while but um ... maybe they are causing issues. Secondly, we've noticed in dump tscomm that we see connection broken to a node, and then the node ID is usually the same node, which is a bit weird to me. We've also just updated firmware on the Intel nics (the x722) which is part of the Skylake board. And specifically its the newer skylake kit we see this problem on. We've a number of issues with the x722 firmware (like it won't even bring a link up when plugged into some of our 10GbE switches, but that's another story). We've also dropped the bonded links from these nodes, just in case its related... Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of jlewars at us.ibm.com [jlewars at us.ibm.com] Sent: 17 January 2019 14:30 To: Tomer Perry; gpfsug main discussion list Cc: Yong Ze Chen Subject: Re: [gpfsug-discuss] Node expels >They always appear to be to a specific type of hardware with the same Ethernet controller, That makes me think you might be seeing packet loss that could require ring buffer tuning (the defaults and limits will differ with different ethernet adapters). The expel section in the slides on this page has been expanded to include a 'debugging expels section' (slides 19-20, which also reference ring buffer tuning): https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/DEBUG%20Expels/comment/7e4f9433-7ca3-430f-b40b-94777c507381 Regards, John Lewars Spectrum Scale Performance, IBM Poughkeepsie From: Tomer Perry/Israel/IBM To: gpfsug main discussion list Cc: John Lewars/Poughkeepsie/IBM at IBMUS, Yong Ze Chen/China/IBM at IBMCN Date: 01/17/2019 08:28 AM Subject: Re: [gpfsug-discuss] Node expels ________________________________ Hi, I was asked to elaborate a bit ( thus also adding John and Yong Ze Chen). As written on the slide: One of the best ways to determine if a network layer problem is root cause for an expel is to look at the low-level socket details dumped in the ?extra? log data (mmfs dump all) saved as part of automatic data collection on Linux GPFS nodes. So, the idea is that in expel situation, we dump the socket state from the OS ( you can see the same using 'ss -i' for example). In your example, it shows that the ca_state is 4, there are retransmits, high rto and all the point to a network problem. You can find more details here: http://www.yonch.com/tech/linux-tcp-congestion-control-internals Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: "Tomer Perry" To: gpfsug main discussion list Date: 17/01/2019 13:46 Subject: Re: [gpfsug-discuss] Node expels Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Simon, Take a look at http://files.gpfsug.org/presentations/2018/USA/Scale_Network_Flow-0.8.pdfslide 13. Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 17/01/2019 13:35 Subject: [gpfsug-discuss] Node expels Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ We?ve recently been seeing quite a few node expels with messages of the form: 2019-01-17_11:19:30.882+0000: [W] The TCP connection to IP address 10.20.0.58 proto-pg-pf01.bear.cluster (socket 153) state is unexpected: state=1 ca_state=4 snd_cwnd=1 snd_ssthresh=5 unacked=5 probes=0 backoff=7 retransmits=7 rto=26496000 rcv_ssthresh=102828 rtt=6729 rttvar=12066 sacked=0 retrans=1 reordering=3 lost=5 2019-01-17_11:19:30.882+0000: [I] tscCheckTcpConn: Sending debug data collection request to node 10.20.0.58 proto-pg-pf01.bear.cluster 2019-01-17_11:19:30.882+0000: Sending request to collect TCP debug data to proto-pg-pf01.bear.cluster localNode 2019-01-17_11:19:30.882+0000: [I] Calling user exit script gpfsSendRequestToNodes: event sendRequestToNodes, Async command /usr/lpp/mmfs/bin/mmcommon. 2019-01-17_11:24:52.611+0000: [E] Timed out in 300 seconds waiting for a commMsgCheckMessages reply from node 10.20.0.58 proto-pg-pf01.bear.cluster. Sending expel message. On the client node, we see messages of the form: 2019-01-17_11:19:31.101+0000: [N] sdrServ: Received Tcp data collection request from 10.10.0.33 2019-01-17_11:19:31.102+0000: [N] GPFS will attempt to collect Tcp debug data on this node. 2019-01-17_11:24:52.838+0000: [N] sdrServ: Received expel data collection request from 10.10.0.33 2019-01-17_11:24:52.838+0000: [N] GPFS will attempt to collect debug data on this node. 2019-01-17_11:25:02.741+0000: [N] This node will be expelled from cluster rds.gpfs.servers due to expel msg from 10.10.12.41 (b ber-les-nsd01-data.bb2.cluster in rds.gpfs.server 2019-01-17_11:25:03.160+0000: [N] sdrServ: Received expel data collection request from 10.20.0.56 They always appear to be to a specific type of hardware with the same Ethernet controller, though the nodes are split across three data centres and we aren?t seeing link congestion on the links between them. On the node I listed above, it?s not actually doing anything either as the software on it is still being installed (i.e. it?s not doing GPFS or any other IO other than a couple of home directories). Any suggestions on what ?(socket 153) state is unexpected? means? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From orichards at pixitmedia.com Thu Jan 17 20:52:50 2019 From: orichards at pixitmedia.com (Orlando Richards) Date: Thu, 17 Jan 2019 20:52:50 +0000 Subject: [gpfsug-discuss] Node expels In-Reply-To: References: Message-ID: <4e0ea3c4-3076-e9a0-55c3-58f98be96d9b@pixitmedia.com> Hi Simon, We've had to disable the offload's for Intel cards in many situations with the i40e drivers - Redhat have an article about it: https://access.redhat.com/solutions/3662011 ------- Orlando On 17/01/2019 19:02, Simon Thompson wrote: > So we've backed out a bunch of network tuning parameters we had set > (based on the GPFS wiki pages), they've been set a while but um ... > maybe they are causing issues. > > Secondly, we've noticed in dump tscomm that we see connection broken > to a node, and then the node ID is usually the same node, which is a > bit weird to me. > > We've also just updated firmware on the Intel nics (the x722) which is > part of the Skylake board. And specifically its the newer skylake kit > we see this problem on. We've a number of issues with the x722 > firmware (like it won't even bring a link up when plugged into some of > our 10GbE switches, but that's another story). > > We've also dropped the bonded links from these nodes, just in case its > related... > > Simon > > ------------------------------------------------------------------------ > *From:* gpfsug-discuss-bounces at spectrumscale.org > [gpfsug-discuss-bounces at spectrumscale.org] on behalf of > jlewars at us.ibm.com [jlewars at us.ibm.com] > *Sent:* 17 January 2019 14:30 > *To:* Tomer Perry; gpfsug main discussion list > *Cc:* Yong Ze Chen > *Subject:* Re: [gpfsug-discuss] Node expels > > >They always appear to be to a specific type of hardware with the same > Ethernet controller, > > That makes me think you might be seeing packet loss that could require > ring buffer tuning (the defaults and limits will differ with different > ethernet adapters). > > The expel section in the slides on this page has been expanded to > include a 'debugging expels section' (slides 19-20, which also > reference ring buffer tuning): > https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/DEBUG%20Expels/comment/7e4f9433-7ca3-430f-b40b-94777c507381 > > Regards, > John Lewars > Spectrum Scale Performance, IBM Poughkeepsie > > > > > From: Tomer Perry/Israel/IBM > To: gpfsug main discussion list > Cc: John Lewars/Poughkeepsie/IBM at IBMUS, Yong Ze Chen/China/IBM at IBMCN > Date: 01/17/2019 08:28 AM > Subject: Re: [gpfsug-discuss] Node expels > ------------------------------------------------------------------------ > > > Hi, > > I was asked to elaborate a bit ( thus also adding John and Yong Ze Chen). > > As written on the slide: > One of the best ways to determine if a network layer problem is root > cause for an expel is to look at the low-level socket details dumped > in the ?extra? log data (mmfs dump all) saved as part of automatic > data collection on Linux GPFS nodes. > > So, the idea is that in expel situation, we dump the socket state from > the OS ( you can see the same using 'ss -i' for example). > In your example, it shows that the ca_state is 4, there are > retransmits, high rto and all the point to a network problem. > You can find more details here: > http://www.yonch.com/tech/linux-tcp-congestion-control-internals > > > Regards, > > Tomer Perry > Scalable I/O Development (Spectrum Scale) > email: tomp at il.ibm.com > 1 Azrieli Center, Tel Aviv 67021, Israel > Global Tel: ? ?+1 720 3422758 > Israel Tel: ? ? ?+972 3 9188625 > Mobile: ? ? ? ? +972 52 2554625 > > > > > > From: "Tomer Perry" > To: gpfsug main discussion list > Date: 17/01/2019 13:46 > Subject: Re: [gpfsug-discuss] Node expels > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------------------------------------------------ > > > > Simon, > > Take a look at > _http://files.gpfsug.org/presentations/2018/USA/Scale_Network_Flow-0.8.pdf_slide > 13. > > > Regards, > > Tomer Perry > Scalable I/O Development (Spectrum Scale) > email: tomp at il.ibm.com > 1 Azrieli Center, Tel Aviv 67021, Israel > Global Tel: ? ?+1 720 3422758 > Israel Tel: ? ? ?+972 3 9188625 > Mobile: ? ? ? ? +972 52 2554625 > > > > > From: Simon Thompson > To: "gpfsug-discuss at spectrumscale.org" > Date: 17/01/2019 13:35 > Subject: [gpfsug-discuss] Node expels > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------------------------------------------------ > > > > We?ve recently been seeing quite a few node expels with messages of > the form: > > 2019-01-17_11:19:30.882+0000: [W] The TCP connection to IP address > 10.20.0.58 proto-pg-pf01.bear.cluster (socket 153) state is > unexpected: state=1 ca_state=4 snd_cwnd=1 snd_ssthresh=5 unacked=5 > probes=0 backoff=7 retransmits=7 rto=26496000 rcv_ssthresh=102828 > rtt=6729 rttvar=12066 sacked=0 retrans=1 reordering=3 lost=5 > 2019-01-17_11:19:30.882+0000: [I] tscCheckTcpConn: Sending debug data > collection request to node 10.20.0.58 proto-pg-pf01.bear.cluster > 2019-01-17_11:19:30.882+0000: Sending request to collect TCP debug > data to proto-pg-pf01.bear.cluster localNode > 2019-01-17_11:19:30.882+0000: [I] Calling user exit script > gpfsSendRequestToNodes: event sendRequestToNodes, Async command > /usr/lpp/mmfs/bin/mmcommon. > 2019-01-17_11:24:52.611+0000: [E] Timed out in 300 seconds waiting for > a commMsgCheckMessages reply from node 10.20.0.58 > proto-pg-pf01.bear.cluster. Sending expel message. > > On the client node, we see messages of the form: > > 2019-01-17_11:19:31.101+0000: [N] sdrServ: Received Tcp data > collection request from 10.10.0.33 > 2019-01-17_11:19:31.102+0000: [N] GPFS will attempt to collect Tcp > debug data on this node. > 2019-01-17_11:24:52.838+0000: [N] sdrServ: Received expel data > collection request from 10.10.0.33 > 2019-01-17_11:24:52.838+0000: [N] GPFS will attempt to collect debug > data on this node. > 2019-01-17_11:25:02.741+0000: [N] This node will be expelled from > cluster rds.gpfs.servers due to expel msg from 10.10.12.41 (b > ber-les-nsd01-data.bb2.cluster in rds.gpfs.server > 2019-01-17_11:25:03.160+0000: [N] sdrServ: Received expel data > collection request from 10.20.0.56 > > They always appear to be to a specific type of hardware with the same > Ethernet controller, though the nodes are split across three data > centres and we aren?t seeing link congestion on the links between them. > > On the node I listed above, it?s not actually doing anything either as > the software on it is still being installed (i.e. it?s not doing GPFS > or any other IO other than a couple of home directories). > > Any suggestions on what ?(socket 153) state is unexpected? means? > > Thanks > > Simon > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org_ > __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_ > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Fri Jan 18 15:23:09 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 18 Jan 2019 15:23:09 +0000 Subject: [gpfsug-discuss] DSS-G Message-ID: <39dbe81fed83a72913decd11cc3f55e04ef5bb08.camel@strath.ac.uk> Anyone out their with a DSS-G using SR650 servers? We have one and after some hassle we have finally got the access to the software downloads and I have been reading through the documentation to familiarize myself with the upgrade procedure. Skipping over the shear madness of that which appears to involved doing a complete netboot reisntall of the nodes for every upgrade, it looks like we have wrong hardware. It all came in a Lenovo rack with factory cabling so one assumes it would be correct. However the "Manufactoring Preload Procedure" document says The DSS-G installation scripts assume that IPMI access to the servers is set up through the first regular 1GbE Ethernet port of the server (marked with a green star in figure 21) in shared mode, not through the dedicated IPMI port under the first three PCIe slots of the SR650 server?s back, and not on the lower left side of the x3650 M5 server?s back. Except our SR650's have 2x10GbE SFP+ LOM and the XCC is connected to the dedicated IPMI port. Oh great, reinstalling the OS for an update is already giving me the screaming heebie jeebies, but now my factory delivered setup is wrong. So in my book increased chance of the install procedure writing all over the disks during install and blowing away the NSD's. Last time I was involved in an net install of RHEL (well CentOS but makes little difference) onto a GPFS not with attached disks the installer wrote all over the NSD descriptors and destroyed the file system. So before one plays war with Lenovo for shipping an unsupported configuration I was wondering how other DSS-G's with SR650's have come from the factory. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From S.J.Thompson at bham.ac.uk Fri Jan 18 16:02:48 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 18 Jan 2019 16:02:48 +0000 Subject: [gpfsug-discuss] DSS-G In-Reply-To: <39dbe81fed83a72913decd11cc3f55e04ef5bb08.camel@strath.ac.uk> References: <39dbe81fed83a72913decd11cc3f55e04ef5bb08.camel@strath.ac.uk> Message-ID: <70C48D1B-4E99-4831-A9D9-AFD326154D8A@bham.ac.uk> I have several. One of mine was shipped for customer rack (which happened to be an existing Lenovo rack anyway), the other was based on 3560m5 so cabled differently then anyway (and its now a franken DSS-G as we upgraded the servers to SR650 and added an SSD tray, but I have so much non-standard Lenovo config stuff in our systems ....) If you bond the LOM ports together then you can't use the XCC in shared mode. But the installer scripts will make it shared when you reinstall/upgrade. Well, it can half work in some cases depending on how you have your switch connected. For example we set the switch to fail back to non-bond mode (relatively common now), which is find when the OS is not booted, you can talk to XCC. But as soon as the OS boots and it bonds, the switch port turns into a bond/trunk port and BAM, you can no longer talk to the XCC port. We have an xcat post script to put it back to being dedicated on the XCC port. So during install you lose access for a little while whilst the Lenovo script runs before my script puts it back again. And if you read the upgrade guide, then it tells you to unplug the SAS ports before doing the reinstall (OK I haven't checked the 2.2a upgrade guide, but it always did). HOWEVER, the xcat template for DSS-G should also black list the SAS driver to prevent it seeing the attached JBOD storage. AND GPFS now writes proper GPT headers as well to the disks which the installer should then leave alone. (But yes, haven't we all done an install and wiped the disk headers ... GPFS works great until you try to mount the file-system sometime later) On the needing to reinstall ... I agree I don't like the reinstall to upgrade between releases, but if you look what it's doing it sorta half makes sense. For example it force flashes an exact validated firmware onto the SAS cards and forces the port config etc onto the card to being in a known current state. I don't like it, but I see why it's done like that. We have in the past picked the relevant bits out (e.g. disk firmware and GPFS packages), and done just those, THIS IS NOT SUPPORTED, but we did pick it apart to see what had changed. If you go to 2.2a as well, the gui is now moved out (it was a bad idea to install on the DSS-G nodes anyway I'm sure), and the pmcollector package magically doesn't get installed either on the DSS-G nodes. Oh AND, the LOM ports ... if you upgrade to DSS-G 2.2a, that will flash the firmware to Intel 4.0 release for the X722. And that doesn't work if you have Mellanox Ethernet switches running Cumulus. (we proved it was the firmware by upgrading another SR650 to the latest firmware and suddenly it no longer works) - you won't get a link up, even at PXE time so not a driver issue. And if you have a VDX switch you need another workaround ... Simon ?On 18/01/2019, 15:38, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" wrote: Anyone out their with a DSS-G using SR650 servers? We have one and after some hassle we have finally got the access to the software downloads and I have been reading through the documentation to familiarize myself with the upgrade procedure. Skipping over the shear madness of that which appears to involved doing a complete netboot reisntall of the nodes for every upgrade, it looks like we have wrong hardware. It all came in a Lenovo rack with factory cabling so one assumes it would be correct. However the "Manufactoring Preload Procedure" document says The DSS-G installation scripts assume that IPMI access to the servers is set up through the first regular 1GbE Ethernet port of the server (marked with a green star in figure 21) in shared mode, not through the dedicated IPMI port under the first three PCIe slots of the SR650 server?s back, and not on the lower left side of the x3650 M5 server?s back. Except our SR650's have 2x10GbE SFP+ LOM and the XCC is connected to the dedicated IPMI port. Oh great, reinstalling the OS for an update is already giving me the screaming heebie jeebies, but now my factory delivered setup is wrong. So in my book increased chance of the install procedure writing all over the disks during install and blowing away the NSD's. Last time I was involved in an net install of RHEL (well CentOS but makes little difference) onto a GPFS not with attached disks the installer wrote all over the NSD descriptors and destroyed the file system. So before one plays war with Lenovo for shipping an unsupported configuration I was wondering how other DSS-G's with SR650's have come from the factory. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathan.buzzard at strath.ac.uk Fri Jan 18 17:14:52 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 18 Jan 2019 17:14:52 +0000 Subject: [gpfsug-discuss] DSS-G In-Reply-To: <70C48D1B-4E99-4831-A9D9-AFD326154D8A@bham.ac.uk> References: <39dbe81fed83a72913decd11cc3f55e04ef5bb08.camel@strath.ac.uk> <70C48D1B-4E99-4831-A9D9-AFD326154D8A@bham.ac.uk> Message-ID: <901117abe1768c9d02aae3b6cc9b5cf47dc3cc97.camel@strath.ac.uk> On Fri, 2019-01-18 at 16:02 +0000, Simon Thompson wrote: [SNIP] > > If you bond the LOM ports together then you can't use the XCC in > shared mode. But the installer scripts will make it shared when you > reinstall/upgrade. Well, it can half work in some cases depending on > how you have your switch connected. For example we set the switch to > fail back to non-bond mode (relatively common now), which is find > when the OS is not booted, you can talk to XCC. But as soon as the OS > boots and it bonds, the switch port turns into a bond/trunk port and > BAM, you can no longer talk to the XCC port. We don't have that issue :-) Currently there is nothing plugged into the LOM because we are using the Mellanox ConnectX4 card for bonded 40Gbps Ethernet to carry the GPFS traffic in the main with one of the ports on the two cards set to Infiniband so the storage can be mounted on an old cluster which only has 1Gb Ethernet (new cluster uses 10GbE networking to carry storage). However we have a shortage of 10GbE ports and the documentation says it should be 1GbE anyway, hence asking what Lenovo might have shipped to other people, as we have a disparity between what has been shipped and what the documentation says it should be like. [SNIP] > And if you read the upgrade guide, then it tells you to unplug the > SAS ports before doing the reinstall (OK I haven't checked the 2.2a > upgrade guide, but it always did). Well the 2.2a documentation does not say anything about that :-) I had basically decided however it was going to be necessary for safety purposes. While I do have a full backup of the file system I don't want to have to use it. > HOWEVER, the xcat template for DSS-G should also black list the SAS > driver to prevent it seeing the attached JBOD storage. AND GPFS now > writes proper GPT headers as well to the disks which the installer > should then leave alone. (But yes, haven't we all done an install and > wiped the disk headers ... GPFS works great until you try to mount > the file-system sometime later) Well I have never wiped my NSD's, just the numpty getting ready to prepare the CentOS6 upgrade for the cluster forgot to unzone the storage arrays (cluster had FC attached storage to all nodes for performance reasons, back in the day 4Gb FC was a lot cheaper than 10GbE and 1GbE was not fast enough) and wiped it for me :-( > On the needing to reinstall ... I agree I don't like the reinstall to > upgrade between releases, but if you look what it's doing it sorta > half makes sense. For example it force flashes an exact validated > firmware onto the SAS cards and forces the port config etc onto the > card to being in a known current state. I don't like it, but I see > why it's done like that. Except that does not require a reinstall of the OS to achieve. Reinstalling from scratch for an update is complete madness IMHO. > > If you go to 2.2a as well, the gui is now moved out (it was a bad > idea to install on the DSS-G nodes anyway I'm sure), and the > pmcollector package magically doesn't get installed either on the > DSS-G nodes. > Currently we don't have the GUI installed anywhere. I am not sure I trust IBM yet to not change the GUI completely again to be bothered getting it to work. > Oh AND, the LOM ports ... if you upgrade to DSS-G 2.2a, that will > flash the firmware to Intel 4.0 release for the X722. And that > doesn't work if you have Mellanox Ethernet switches running > Cumulus. (we proved it was the firmware by upgrading another SR650 > to the latest firmware and suddenly it no longer works) - you won't > get a link up, even at PXE time so not a driver issue. And if you > have a VDX switch you need another workaround ... > We have Lenovo switches, so hopefully Lenovo tested with their own switches work ;-) Mind you I get this running the dssgcktopology tool Warning: Unsupported configuration of odd number of enclosures detected. Which nitwit wrote that script then? From the "Manufacturing Preload Procedure" for 2.2a on page 9 For the high density DSS models DSS-G210, DSS-G220, DSS-G240 and DSS-G260 with 3.5? NL-SAS disks (7.2k RPM), the DSS-G building block contains one, two, four or six Lenovo D3284 disk enclosures. Right so what is it then? Because one enclosure which is clearly an odd number of enclosures is allegedly an unsupported configuration according to the tool, but supported according to the documentation!!! JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From matthew.robinson02 at gmail.com Fri Jan 18 19:25:35 2019 From: matthew.robinson02 at gmail.com (Matthew Robinson) Date: Fri, 18 Jan 2019 14:25:35 -0500 Subject: [gpfsug-discuss] DSS-G In-Reply-To: <39dbe81fed83a72913decd11cc3f55e04ef5bb08.camel@strath.ac.uk> References: <39dbe81fed83a72913decd11cc3f55e04ef5bb08.camel@strath.ac.uk> Message-ID: Hi Jonathan, In the last DSS 2.x tarballs there should a PDG included. This should provide alot of detail going over the solutions configuration and common problems for troubleshooting. Or at least the Problem Determantion Guide was there be for my department let me go. The shared IMM port is pretty standard from the 3650 to the SD530's for the most part. You should have a port marked shared on either and the IPMI interace is to be shared mode for dual subnet masks on the same NIC. This is is the standard xcat configuration from Sourcforge. If I am not mistaken the PDG should be stored in the first DSS-G version tarball for reference. Hope this helps, Matthew Robinson On Fri, Jan 18, 2019 at 10:23 AM Jonathan Buzzard < jonathan.buzzard at strath.ac.uk> wrote: > > Anyone out their with a DSS-G using SR650 servers? > > We have one and after some hassle we have finally got the access to the > software downloads and I have been reading through the documentation to > familiarize myself with the upgrade procedure. > > Skipping over the shear madness of that which appears to involved doing > a complete netboot reisntall of the nodes for every upgrade, it looks > like we have wrong hardware. It all came in a Lenovo rack with factory > cabling so one assumes it would be correct. > > However the "Manufactoring Preload Procedure" document says > > The DSS-G installation scripts assume that IPMI access to the > servers is set up through the first regular 1GbE Ethernet port > of the server (marked with a green star in figure 21) in shared > mode, not through the dedicated IPMI port under the first three > PCIe slots of the SR650 server?s back, and not on the lower left > side of the x3650 M5 server?s back. > > Except our SR650's have 2x10GbE SFP+ LOM and the XCC is connected to > the dedicated IPMI port. Oh great, reinstalling the OS for an update is > already giving me the screaming heebie jeebies, but now my factory > delivered setup is wrong. So in my book increased chance of the install > procedure writing all over the disks during install and blowing away > the NSD's. Last time I was involved in an net install of RHEL (well > CentOS but makes little difference) onto a GPFS not with attached disks > the installer wrote all over the NSD descriptors and destroyed the file > system. > > So before one plays war with Lenovo for shipping an unsupported > configuration I was wondering how other DSS-G's with SR650's have come > from the factory. > > JAB. > > -- > Jonathan A. Buzzard Tel: +44141-5483420 > HPC System Administrator, ARCHIE-WeSt. > University of Strathclyde, John Anderson Building, Glasgow. G4 0NG > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Matthew Robinson Comptia A+, Net+ 919.909.0494 matthew.robinson02 at gmail.com The greatest discovery of my generation is that man can alter his life simply by altering his attitude of mind. - William James, Harvard Psychologist. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Mon Jan 21 15:59:29 2019 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Mon, 21 Jan 2019 15:59:29 +0000 Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Message-ID: <5b711884338047fbaf5a61005964652d@SMXRF105.msg.hukrf.de> Hello All, We test spectrum scale on an windows only Client-Cluster (remote mounted to a linux Cluster) but the execution of mm commands in cygwin is very slow. We have tried the following adjustments to increase the execution speed. * We have installed Cygwin Server as a service (cygserver-config). Unfortunately, this resulted in no faster execution. * Adaptation of the hosts file: 127.0.0.1 localhost cygdrive wpad to prevent any DNS problems when accessing ?/cygdrive/...? * Started them as Administrator All adjustments have so far not led to any improvement. Are there any hints to enhance the cmd execution time on windows (w2k12 actual used) Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From spectrumscale at kiranghag.com Mon Jan 21 16:03:13 2019 From: spectrumscale at kiranghag.com (KG) Date: Mon, 21 Jan 2019 21:33:13 +0530 Subject: [gpfsug-discuss] Dr site using full replication? Message-ID: Hi Folks Has anyone replicated scale node to a dr site by replicating boot disks and nsd ? The same hostnames and ip subnet would be available on the other site and cluster should be able to operate from any one location at a time. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon Jan 21 16:02:50 2019 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 21 Jan 2019 16:02:50 +0000 Subject: [gpfsug-discuss] Get list of filesets_without_runningmmlsfileset? In-Reply-To: References: <5DCBA252-33B7-4AC9-B2E3-10DA6881B1AA@vanderbilt.edu> Message-ID: <60451989-2E0B-4CF9-A6E2-BC0939169311@vanderbilt.edu> Hi All, I just wanted to follow up on this thread ? the only way I have found to obtain a list of filesets and their associated junction paths as a non-root user is via the REST API (and thanks to those who suggested that). However, AFAICT querying the REST API via a script would expose the username / password used to do so to anyone who bothered to look at the code, which would in turn allow a knowledgeable and curious user to query the REST API themselves for other information we do not necessarily want to expose to them. Therefore, it is not an acceptable solution to us. Therefore, unless someone responds with a way to allow a non-root user to obtain fileset junction paths that doesn?t involve the REST API, I?m afraid I?m at a dead end in terms of making our quota usage Python script something that I can share with the broader community. It just has too much site-specific code in it. Sorry? Kevin P.S. In case you?re curious about how the quota script is obtaining those junction paths ? we have a cron job that runs once per hour on the cluster manager that dumps the output of mmlsfileset to a text file, which the script then reads. The cron job used to just run once per day and used to just run mmlsfileset. I have modified it to be a shell script which checks for the load average on the cluster manager being less than 10 and that there are no waiters of more than 10 seconds duration. If both of those conditions are true, it runs mmlsfileset. If either are not, it simply exits ? the idea being that one or both of those would likely be true if something were going on with the cluster manager that would cause the mmlsfileset to hang. I have also modified the quota script itself so that it checks that the junction path for a fileset actually exists before attempting to stat it (duh - should?ve done that from the start), which handles the case where a user would run the quota script and it would bomb off with an exception because the fileset was deleted and the cron job hadn?t run yet. If a new fileset is created, well, it just won?t get checked by the quota script until the cron job runs successfully. We have decided that this is an acceptable compromise. On Jan 15, 2019, at 8:46 AM, Marc A Kaplan > wrote: Personally, I agree that there ought to be a way in the product. In the meawhile, you no doubt already have some ways to tell your users where to find their filesets as pathnames. Otherwise, how are they accessing their files? And to keep things somewhat sane, I'd bet filesets are all linked to one or small number of well known paths in the filesystem. Like /AGpfsFilesystem/filesets/... Plus you could add symlinks and/or as has been suggested post info extracted from mmlsfileset and/or mmlsquota. So as a practical matter, is this an urgent problem...? Why? How? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeverdon at us.ibm.com Mon Jan 21 22:41:26 2019 From: jeverdon at us.ibm.com (Jodi E Everdon) Date: Mon, 21 Jan 2019 17:41:26 -0500 Subject: [gpfsug-discuss] post to list Message-ID: Jodi Everdon IBM New Technology Introduction (NTI) 2455 South Road Client Experience Validation Poughkeepsie, NY 12601 Email: jeverdon at us.ibm.com North America IBM IT Infrastructure: www.ibm.com/it-infrastructure -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 15606074.gif Type: image/gif Size: 1851 bytes Desc: not available URL: From scale at us.ibm.com Mon Jan 21 23:34:31 2019 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Mon, 21 Jan 2019 15:34:31 -0800 Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays In-Reply-To: <5b711884338047fbaf5a61005964652d@SMXRF105.msg.hukrf.de> References: <5b711884338047fbaf5a61005964652d@SMXRF105.msg.hukrf.de> Message-ID: Hello Renar, A few things to try: Make sure IPv6 is disabled. On each Windows node, run "mmcmi host ", with being itself and each and every node in the cluster. Make sure mmcmi prints valid IPv4 address. To eliminate DNS issues, try adding IPv4 entries for each cluster node in "c:\windows\system32\drivers\etc\hosts". If any anti-virus is active, disable realtime scanning on c:\cygwin64 (wherever you installed cygwin 64-bit). You can also try debugging a script, say: (from GPFS ksh): DEBUG=1 mmlscluster, and see what takes time. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Grunenberg, Renar" To: "'gpfsug-discuss at spectrumscale.org'" Date: 01/21/2019 08:01 AM Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello All, We test spectrum scale on an windows only Client-Cluster (remote mounted to a linux Cluster) but the execution of mm commands in cygwin is very slow. We have tried the following adjustments to increase the execution speed. We have installed Cygwin Server as a service (cygserver-config). Unfortunately, this resulted in no faster execution. Adaptation of the hosts file: 127.0.0.1 localhost cygdrive wpad to prevent any DNS problems when accessing ?/cygdrive/...? Started them as Administrator All adjustments have so far not led to any improvement. Are there any hints to enhance the cmd execution time on windows (w2k12 actual used) Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=frR4WiYT89JSgLnJMtRAlESzRXWW2YatEwsuuV8M810&s=FSjMBxMo8G8y3VR2A59hgIWaHPKPFNHU7RXcneIVCPE&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From A.Wolf-Reber at de.ibm.com Tue Jan 22 07:36:15 2019 From: A.Wolf-Reber at de.ibm.com (Alexander Wolf) Date: Tue, 22 Jan 2019 07:36:15 +0000 Subject: [gpfsug-discuss] Get list of filesets_without_runningmmlsfileset? In-Reply-To: <60451989-2E0B-4CF9-A6E2-BC0939169311@vanderbilt.edu> References: <60451989-2E0B-4CF9-A6E2-BC0939169311@vanderbilt.edu>, <5DCBA252-33B7-4AC9-B2E3-10DA6881B1AA@vanderbilt.edu> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15481420128480.png Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15481420128481.png Type: image/png Size: 6645 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15481420128482.png Type: image/png Size: 1134 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Tue Jan 22 14:35:02 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 22 Jan 2019 14:35:02 +0000 Subject: [gpfsug-discuss] Node expels In-Reply-To: References: Message-ID: <0B0D4ACE-1B54-4D22-85E3-B3154DD7C943@bham.ac.uk> OK we think we might have a reason for this. We run iptables on some of our management function nodes, and we found that in some cases, our config management tool can cause a ?systemctl restart iptables? to occur (the rule ordering generation was non deterministic meaning it could shuffle rules ? we fixed that and made it reload rather than restart). Which takes a fraction of a second, but it appears that this is sufficient for GPFS to get into a state. What I didn?t mention before was that we could get it into a state where the only way to recover was to shutdown the storage cluster and restart it. I?m not sure why normal expel and recovery doesn?t appear to work in this case, though we?re not 100% certain that its iptables restart. (we just have a very smoky gun at present). (I have a ticket with that question open). Maybe it?s a combination of having a default DROP policy on iptables as well - we have also switched to ACCEPT and added a DROP rule at the end of the ruleset which gives the same result. Simon From: on behalf of "jlewars at us.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Thursday, 17 January 2019 at 14:31 To: Tomer Perry , "gpfsug-discuss at spectrumscale.org" Cc: Yong Ze Chen Subject: Re: [gpfsug-discuss] Node expels >They always appear to be to a specific type of hardware with the same Ethernet controller, That makes me think you might be seeing packet loss that could require ring buffer tuning (the defaults and limits will differ with different ethernet adapters). The expel section in the slides on this page has been expanded to include a 'debugging expels section' (slides 19-20, which also reference ring buffer tuning): https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/DEBUG%20Expels/comment/7e4f9433-7ca3-430f-b40b-94777c507381 Regards, John Lewars Spectrum Scale Performance, IBM Poughkeepsie From: Tomer Perry/Israel/IBM To: gpfsug main discussion list Cc: John Lewars/Poughkeepsie/IBM at IBMUS, Yong Ze Chen/China/IBM at IBMCN Date: 01/17/2019 08:28 AM Subject: Re: [gpfsug-discuss] Node expels ________________________________ Hi, I was asked to elaborate a bit ( thus also adding John and Yong Ze Chen). As written on the slide: One of the best ways to determine if a network layer problem is root cause for an expel is to look at the low-level socket details dumped in the ?extra? log data (mmfs dump all) saved as part of automatic data collection on Linux GPFS nodes. So, the idea is that in expel situation, we dump the socket state from the OS ( you can see the same using 'ss -i' for example). In your example, it shows that the ca_state is 4, there are retransmits, high rto and all the point to a network problem. You can find more details here: http://www.yonch.com/tech/linux-tcp-congestion-control-internals Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: "Tomer Perry" To: gpfsug main discussion list Date: 17/01/2019 13:46 Subject: Re: [gpfsug-discuss] Node expels Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Simon, Take a look at http://files.gpfsug.org/presentations/2018/USA/Scale_Network_Flow-0.8.pdfslide 13. Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 17/01/2019 13:35 Subject: [gpfsug-discuss] Node expels Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ We?ve recently been seeing quite a few node expels with messages of the form: 2019-01-17_11:19:30.882+0000: [W] The TCP connection to IP address 10.20.0.58 proto-pg-pf01.bear.cluster (socket 153) state is unexpected: state=1 ca_state=4 snd_cwnd=1 snd_ssthresh=5 unacked=5 probes=0 backoff=7 retransmits=7 rto=26496000 rcv_ssthresh=102828 rtt=6729 rttvar=12066 sacked=0 retrans=1 reordering=3 lost=5 2019-01-17_11:19:30.882+0000: [I] tscCheckTcpConn: Sending debug data collection request to node 10.20.0.58 proto-pg-pf01.bear.cluster 2019-01-17_11:19:30.882+0000: Sending request to collect TCP debug data to proto-pg-pf01.bear.cluster localNode 2019-01-17_11:19:30.882+0000: [I] Calling user exit script gpfsSendRequestToNodes: event sendRequestToNodes, Async command /usr/lpp/mmfs/bin/mmcommon. 2019-01-17_11:24:52.611+0000: [E] Timed out in 300 seconds waiting for a commMsgCheckMessages reply from node 10.20.0.58 proto-pg-pf01.bear.cluster. Sending expel message. On the client node, we see messages of the form: 2019-01-17_11:19:31.101+0000: [N] sdrServ: Received Tcp data collection request from 10.10.0.33 2019-01-17_11:19:31.102+0000: [N] GPFS will attempt to collect Tcp debug data on this node. 2019-01-17_11:24:52.838+0000: [N] sdrServ: Received expel data collection request from 10.10.0.33 2019-01-17_11:24:52.838+0000: [N] GPFS will attempt to collect debug data on this node. 2019-01-17_11:25:02.741+0000: [N] This node will be expelled from cluster rds.gpfs.servers due to expel msg from 10.10.12.41 (b ber-les-nsd01-data.bb2.cluster in rds.gpfs.server 2019-01-17_11:25:03.160+0000: [N] sdrServ: Received expel data collection request from 10.20.0.56 They always appear to be to a specific type of hardware with the same Ethernet controller, though the nodes are split across three data centres and we aren?t seeing link congestion on the links between them. On the node I listed above, it?s not actually doing anything either as the software on it is still being installed (i.e. it?s not doing GPFS or any other IO other than a couple of home directories). Any suggestions on what ?(socket 153) state is unexpected? means? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From rmoye at quantlab.com Tue Jan 22 15:43:26 2019 From: rmoye at quantlab.com (Roger Moye) Date: Tue, 22 Jan 2019 15:43:26 +0000 Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays In-Reply-To: References: <5b711884338047fbaf5a61005964652d@SMXRF105.msg.hukrf.de> Message-ID: <18bab23b080c4ad487c68b8ebc04b975@quantlab.com> We experienced the same issue and were advised not to use Windows for quorum nodes. We moved our Windows nodes into the storage cluster which was entirely Linux and that solved it. If this is not an option, perhaps adding some Linux nodes to your remote cluster as quorum nodes would help. -Roger From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of IBM Spectrum Scale Sent: Monday, January 21, 2019 5:35 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Hello Renar, A few things to try: 1. Make sure IPv6 is disabled. On each Windows node, run "mmcmi host ", with being itself and each and every node in the cluster. Make sure mmcmi prints valid IPv4 address. 1. To eliminate DNS issues, try adding IPv4 entries for each cluster node in "c:\windows\system32\drivers\etc\hosts". 1. If any anti-virus is active, disable realtime scanning on c:\cygwin64 (wherever you installed cygwin 64-bit). You can also try debugging a script, say: (from GPFS ksh): DEBUG=1 mmlscluster, and see what takes time. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Grunenberg, Renar" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 01/21/2019 08:01 AM Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello All, We test spectrum scale on an windows only Client-Cluster (remote mounted to a linux Cluster) but the execution of mm commands in cygwin is very slow. We have tried the following adjustments to increase the execution speed. * We have installed Cygwin Server as a service (cygserver-config). Unfortunately, this resulted in no faster execution. * Adaptation of the hosts file: 127.0.0.1localhost cygdrive wpad to prevent any DNS problems when accessing "/cygdrive/..." * Started them as Administrator All adjustments have so far not led to any improvement. Are there any hints to enhance the cmd execution time on windows (w2k12 actual used) Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ----------------------------------------------------------------------------------- The information in this communication and any attachment is confidential and intended solely for the attention and use of the named addressee(s). All information and opinions expressed herein are subject to change without notice. This communication is not to be construed as an offer to sell or the solicitation of an offer to buy any security. Any such offer or solicitation can only be made by means of the delivery of a confidential private offering memorandum (which should be carefully reviewed for a complete description of investment strategies and risks). Any reliance one may place on the accuracy or validity of this information is at their own risk. Past performance is not necessarily indicative of the future results of an investment. All figures are estimated and unaudited unless otherwise noted. If you are not the intended recipient, or a person responsible for delivering this to the intended recipient, you are not authorized to and must not disclose, copy, distribute, or retain this message or any part of it. In this case, please notify the sender immediately at 713-333-5440 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Tue Jan 22 17:10:24 2019 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Tue, 22 Jan 2019 17:10:24 +0000 Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays In-Reply-To: <18bab23b080c4ad487c68b8ebc04b975@quantlab.com> References: <5b711884338047fbaf5a61005964652d@SMXRF105.msg.hukrf.de> <18bab23b080c4ad487c68b8ebc04b975@quantlab.com> Message-ID: Hallo Roger, first thanks fort he tip. But we decided to separate the linux-io-Cluster from the Windows client only cluster, because of security requirements and ssh management requirements. We can use at this point, local named admins on Windows and use on Linux a Deamon and an separated Admin-interface Network for pwless root ssh. Your Hint seems to be CCR related or is this a Cygwin problem. @Spectrum Scale Team: Point1: IP V6 can?t disabled because of applications that want to use this. But the mmcmi cmd are give us already the right ipv4 adresses. Point2. There are no DNS-Issues Point3: We must check these. Any recommendations to Rogers statements? Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Roger Moye Gesendet: Dienstag, 22. Januar 2019 16:43 An: gpfsug main discussion list Betreff: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays We experienced the same issue and were advised not to use Windows for quorum nodes. We moved our Windows nodes into the storage cluster which was entirely Linux and that solved it. If this is not an option, perhaps adding some Linux nodes to your remote cluster as quorum nodes would help. -Roger From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of IBM Spectrum Scale Sent: Monday, January 21, 2019 5:35 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Hello Renar, A few things to try: 1. Make sure IPv6 is disabled. On each Windows node, run "mmcmi host ", with being itself and each and every node in the cluster. Make sure mmcmi prints valid IPv4 address. 1. To eliminate DNS issues, try adding IPv4 entries for each cluster node in "c:\windows\system32\drivers\etc\hosts". 1. If any anti-virus is active, disable realtime scanning on c:\cygwin64 (wherever you installed cygwin 64-bit). You can also try debugging a script, say: (from GPFS ksh): DEBUG=1 mmlscluster, and see what takes time. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Grunenberg, Renar" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 01/21/2019 08:01 AM Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello All, We test spectrum scale on an windows only Client-Cluster (remote mounted to a linux Cluster) but the execution of mm commands in cygwin is very slow. We have tried the following adjustments to increase the execution speed. * We have installed Cygwin Server as a service (cygserver-config). Unfortunately, this resulted in no faster execution. * Adaptation of the hosts file: 127.0.0.1localhost cygdrive wpad to prevent any DNS problems when accessing ?/cygdrive/...? * Started them as Administrator All adjustments have so far not led to any improvement. Are there any hints to enhance the cmd execution time on windows (w2k12 actual used) Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ----------------------------------------------------------------------------------- The information in this communication and any attachment is confidential and intended solely for the attention and use of the named addressee(s). All information and opinions expressed herein are subject to change without notice. This communication is not to be construed as an offer to sell or the solicitation of an offer to buy any security. Any such offer or solicitation can only be made by means of the delivery of a confidential private offering memorandum (which should be carefully reviewed for a complete description of investment strategies and risks). Any reliance one may place on the accuracy or validity of this information is at their own risk. Past performance is not necessarily indicative of the future results of an investment. All figures are estimated and unaudited unless otherwise noted. If you are not the intended recipient, or a person responsible for delivering this to the intended recipient, you are not authorized to and must not disclose, copy, distribute, or retain this message or any part of it. In this case, please notify the sender immediately at 713-333-5440 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Achim.Rehor at de.ibm.com Tue Jan 22 18:18:03 2019 From: Achim.Rehor at de.ibm.com (Achim Rehor) Date: Tue, 22 Jan 2019 19:18:03 +0100 Subject: [gpfsug-discuss] Node expels In-Reply-To: <0B0D4ACE-1B54-4D22-85E3-B3154DD7C943@bham.ac.uk> References: <0B0D4ACE-1B54-4D22-85E3-B3154DD7C943@bham.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 7182 bytes Desc: not available URL: From Renar.Grunenberg at huk-coburg.de Wed Jan 23 12:45:39 2019 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Wed, 23 Jan 2019 12:45:39 +0000 Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays In-Reply-To: References: <5b711884338047fbaf5a61005964652d@SMXRF105.msg.hukrf.de> <18bab23b080c4ad487c68b8ebc04b975@quantlab.com> Message-ID: <349cb338583a4c1d996677837fc65b6e@SMXRF105.msg.hukrf.de> Hallo All, as a point to the problem, it seems to be that all the delayes are happening here DEBUG=1 mmgetstate ?a ??.. /bin/rm -f /var/mmfs/gen/mmsdrfs.1256 /var/mmfs/tmp/allClusterNodes.mmgetstate.1256 /var/mmfs/tmp/allQuorumNodes.mmgetstate.1256 /var/mmfs/tmp/allNonQuorumNodes.mmgetstate.1256 /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.pub /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.priv /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.cert /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.keystore /var/mmfs/tmp/nodefile.mmgetstate.1256 /var/mmfs/tmp/diskfile.mmgetstate.1256 /var/mmfs/tmp/diskNamesFile.mmgetstate.1256 Any points to this it will be fixed in the near future are welcome. Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: Grunenberg, Renar Gesendet: Dienstag, 22. Januar 2019 18:10 An: 'gpfsug main discussion list' Betreff: AW: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Hallo Roger, first thanks fort he tip. But we decided to separate the linux-io-Cluster from the Windows client only cluster, because of security requirements and ssh management requirements. We can use at this point, local named admins on Windows and use on Linux a Deamon and an separated Admin-interface Network for pwless root ssh. Your Hint seems to be CCR related or is this a Cygwin problem. @Spectrum Scale Team: Point1: IP V6 can?t disabled because of applications that want to use this. But the mmcmi cmd are give us already the right ipv4 adresses. Point2. There are no DNS-Issues Point3: We must check these. Any recommendations to Rogers statements? Regards Renar Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Roger Moye Gesendet: Dienstag, 22. Januar 2019 16:43 An: gpfsug main discussion list > Betreff: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays We experienced the same issue and were advised not to use Windows for quorum nodes. We moved our Windows nodes into the storage cluster which was entirely Linux and that solved it. If this is not an option, perhaps adding some Linux nodes to your remote cluster as quorum nodes would help. -Roger From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of IBM Spectrum Scale Sent: Monday, January 21, 2019 5:35 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Hello Renar, A few things to try: 1. Make sure IPv6 is disabled. On each Windows node, run "mmcmi host ", with being itself and each and every node in the cluster. Make sure mmcmi prints valid IPv4 address. 1. To eliminate DNS issues, try adding IPv4 entries for each cluster node in "c:\windows\system32\drivers\etc\hosts". 1. If any anti-virus is active, disable realtime scanning on c:\cygwin64 (wherever you installed cygwin 64-bit). You can also try debugging a script, say: (from GPFS ksh): DEBUG=1 mmlscluster, and see what takes time. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Grunenberg, Renar" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 01/21/2019 08:01 AM Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello All, We test spectrum scale on an windows only Client-Cluster (remote mounted to a linux Cluster) but the execution of mm commands in cygwin is very slow. We have tried the following adjustments to increase the execution speed. * We have installed Cygwin Server as a service (cygserver-config). Unfortunately, this resulted in no faster execution. * Adaptation of the hosts file: 127.0.0.1localhost cygdrive wpad to prevent any DNS problems when accessing ?/cygdrive/...? * Started them as Administrator All adjustments have so far not led to any improvement. Are there any hints to enhance the cmd execution time on windows (w2k12 actual used) Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ----------------------------------------------------------------------------------- The information in this communication and any attachment is confidential and intended solely for the attention and use of the named addressee(s). All information and opinions expressed herein are subject to change without notice. This communication is not to be construed as an offer to sell or the solicitation of an offer to buy any security. Any such offer or solicitation can only be made by means of the delivery of a confidential private offering memorandum (which should be carefully reviewed for a complete description of investment strategies and risks). Any reliance one may place on the accuracy or validity of this information is at their own risk. Past performance is not necessarily indicative of the future results of an investment. All figures are estimated and unaudited unless otherwise noted. If you are not the intended recipient, or a person responsible for delivering this to the intended recipient, you are not authorized to and must not disclose, copy, distribute, or retain this message or any part of it. In this case, please notify the sender immediately at 713-333-5440 -------------- next part -------------- An HTML attachment was scrubbed... URL: From heiner.billich at psi.ch Thu Jan 24 14:29:42 2019 From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI)) Date: Thu, 24 Jan 2019 14:29:42 +0000 Subject: [gpfsug-discuss] does ganesha deny access for unknown UIDs? Message-ID: <44051794-8F45-4725-92E0-09729474E7A1@psi.ch> Hello, a local account on a nfs client couldn?t write to a ganesha nfs export even with directory permissions 777. The solution was to create the account on the ganesha servers, too. Please can you confirm that this is the intended behaviour? is there an option to change this and to map unknown accounts to nobody instead? We often have embedded Linux appliances or similar as nfs clients which need to place some data on the nfs exports using uid/gid of local accounts. We manage gids on the server side and allow NFS v3 client access only. I crosspost this to ganesha support and to the gpfsug mailing list. Thank you, Heiner Billich ganesha version: 2.5.3-ibm028.00.el7.x86_64 the ganesha config CacheInode { fd_hwmark_percent=60; fd_lwmark_percent=20; fd_limit_percent=90; lru_run_interval=90; entries_hwmark=1500000; } NFS_Core_Param { clustered=TRUE; rpc_max_connections=10000; heartbeat_freq=0; mnt_port=33247; nb_worker=256; nfs_port=2049; nfs_protocols=3,4; nlm_port=33245; rquota_port=33246; rquota_port=33246; short_file_handle=FALSE; mount_path_pseudo=true; } GPFS { fsal_grace=FALSE; fsal_trace=TRUE; } NFSv4 { delegations=FALSE; domainname=virtual1.com; grace_period=60; lease_lifetime=60; } Export_Defaults { access_type=none; anonymous_gid=-2; anonymous_uid=-2; manage_gids=TRUE; nfs_commit=FALSE; privilegedport=FALSE; protocols=3,4; sectype=sys; squash=root_squash; transports=TCP; } one export # === START /**** id=206 nclients=3 === EXPORT { Attr_Expiration_Time=60; Delegations=none; Export_id=206; Filesystem_id=42.206; MaxOffsetRead=18446744073709551615; MaxOffsetWrite=18446744073709551615; MaxRead=1048576; MaxWrite=1048576; Path="/****"; PrefRead=1048576; PrefReaddir=1048576; PrefWrite=1048576; Pseudo="/****"; Tag="****"; UseCookieVerifier=false; FSAL { Name=GPFS; } CLIENT { # === ****/X12SA === Access_Type=RW; Anonymous_gid=-2; Anonymous_uid=-2; Clients=X.Y.A.B/24; Delegations=none; Manage_Gids=TRUE; NFS_Commit=FALSE; PrivilegedPort=FALSE; Protocols=3; SecType=SYS; Squash=Root; Transports=TCP; } ?. -- Paul Scherrer Institut Heiner Billich System Engineer Scientific Computing Science IT / High Performance Computing WHGA/106 Forschungsstrasse 111 5232 Villigen PSI Switzerland Phone +41 56 310 36 02 heiner.billich at psi.ch https://www.psi.ch -------------- next part -------------- An HTML attachment was scrubbed... URL: From truongv at us.ibm.com Thu Jan 24 18:17:45 2019 From: truongv at us.ibm.com (Truong Vu) Date: Thu, 24 Jan 2019 13:17:45 -0500 Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays In-Reply-To: References: Message-ID: Hi Renar, Let's see if it is really the /bin/rm is the problem here. Can you run the command again without cleanup the temp files as follow: DEBUG=1 keepTempFiles=1 mmgetstate -a Thanks, Tru. From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 01/23/2019 07:46 AM Subject: gpfsug-discuss Digest, Vol 84, Issue 32 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=jDHxo2hE5uOrH5xaI6YYQdQ-O5yZG-udF7ooPNOEUUM&s=UBffyp1tO8WZsaCys72XHljL9SyUe_v4ECCmymP17Lg&e= or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: Spectrum Scale Cygwin cmd delays (Grunenberg, Renar) ---------------------------------------------------------------------- Message: 1 Date: Wed, 23 Jan 2019 12:45:39 +0000 From: "Grunenberg, Renar" To: 'gpfsug main discussion list' Subject: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Message-ID: <349cb338583a4c1d996677837fc65b6e at SMXRF105.msg.hukrf.de> Content-Type: text/plain; charset="utf-8" Hallo All, as a point to the problem, it seems to be that all the delayes are happening here DEBUG=1 mmgetstate ?a ??.. /bin/rm -f /var/mmfs/gen/mmsdrfs.1256 /var/mmfs/tmp/allClusterNodes.mmgetstate.1256 /var/mmfs/tmp/allQuorumNodes.mmgetstate.1256 /var/mmfs/tmp/allNonQuorumNodes.mmgetstate.1256 /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.pub /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.priv /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.cert /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.keystore /var/mmfs/tmp/nodefile.mmgetstate.1256 /var/mmfs/tmp/diskfile.mmgetstate.1256 /var/mmfs/tmp/diskNamesFile.mmgetstate.1256 Any points to this it will be fixed in the near future are welcome. Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: Grunenberg, Renar Gesendet: Dienstag, 22. Januar 2019 18:10 An: 'gpfsug main discussion list' Betreff: AW: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Hallo Roger, first thanks fort he tip. But we decided to separate the linux-io-Cluster from the Windows client only cluster, because of security requirements and ssh management requirements. We can use at this point, local named admins on Windows and use on Linux a Deamon and an separated Admin-interface Network for pwless root ssh. Your Hint seems to be CCR related or is this a Cygwin problem. @Spectrum Scale Team: Point1: IP V6 can?t disabled because of applications that want to use this. But the mmcmi cmd are give us already the right ipv4 adresses. Point2. There are no DNS-Issues Point3: We must check these. Any recommendations to Rogers statements? Regards Renar Von: gpfsug-discuss-bounces at spectrumscale.org< mailto:gpfsug-discuss-bounces at spectrumscale.org> [ mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Roger Moye Gesendet: Dienstag, 22. Januar 2019 16:43 An: gpfsug main discussion list > Betreff: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays We experienced the same issue and were advised not to use Windows for quorum nodes. We moved our Windows nodes into the storage cluster which was entirely Linux and that solved it. If this is not an option, perhaps adding some Linux nodes to your remote cluster as quorum nodes would help. -Roger From: gpfsug-discuss-bounces at spectrumscale.org< mailto:gpfsug-discuss-bounces at spectrumscale.org> [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of IBM Spectrum Scale Sent: Monday, January 21, 2019 5:35 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Hello Renar, A few things to try: 1. Make sure IPv6 is disabled. On each Windows node, run "mmcmi host ", with being itself and each and every node in the cluster. Make sure mmcmi prints valid IPv4 address. 1. To eliminate DNS issues, try adding IPv4 entries for each cluster node in "c:\windows\system32\drivers\etc\hosts". 1. If any anti-virus is active, disable realtime scanning on c:\cygwin64 (wherever you installed cygwin 64-bit). You can also try debugging a script, say: (from GPFS ksh): DEBUG=1 mmlscluster, and see what takes time. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Grunenberg, Renar" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 01/21/2019 08:01 AM Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Sent by: gpfsug-discuss-bounces at spectrumscale.org< mailto:gpfsug-discuss-bounces at spectrumscale.org> ________________________________ Hello All, We test spectrum scale on an windows only Client-Cluster (remote mounted to a linux Cluster) but the execution of mm commands in cygwin is very slow. We have tried the following adjustments to increase the execution speed. * We have installed Cygwin Server as a service (cygserver-config). Unfortunately, this resulted in no faster execution. * Adaptation of the hosts file: 127.0.0.1localhost cygdrive wpad to prevent any DNS problems when accessing ?/cygdrive/...? * Started them as Administrator All adjustments have so far not led to any improvement. Are there any hints to enhance the cmd execution time on windows (w2k12 actual used) Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=jDHxo2hE5uOrH5xaI6YYQdQ-O5yZG-udF7ooPNOEUUM&s=UBffyp1tO8WZsaCys72XHljL9SyUe_v4ECCmymP17Lg&e= ----------------------------------------------------------------------------------- The information in this communication and any attachment is confidential and intended solely for the attention and use of the named addressee(s). All information and opinions expressed herein are subject to change without notice. This communication is not to be construed as an offer to sell or the solicitation of an offer to buy any security. Any such offer or solicitation can only be made by means of the delivery of a confidential private offering memorandum (which should be carefully reviewed for a complete description of investment strategies and risks). Any reliance one may place on the accuracy or validity of this information is at their own risk. Past performance is not necessarily indicative of the future results of an investment. All figures are estimated and unaudited unless otherwise noted. If you are not the intended recipient, or a person responsible for delivering this to the intended recipient, you are not authorized to and must not disclose, copy, distribute, o r retain this message or any part of it. In this case, please notify the sender immediately at 713-333-5440 -------------- next part -------------- An HTML attachment was scrubbed... URL: < https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_pipermail_gpfsug-2Ddiscuss_attachments_20190123_eff7ad74_attachment.html&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=jDHxo2hE5uOrH5xaI6YYQdQ-O5yZG-udF7ooPNOEUUM&s=JWv1FytE6pkOdJtqJV5sSVf3ZwV0B9FDZmfzI7LQEGk&e= > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=jDHxo2hE5uOrH5xaI6YYQdQ-O5yZG-udF7ooPNOEUUM&s=UBffyp1tO8WZsaCys72XHljL9SyUe_v4ECCmymP17Lg&e= End of gpfsug-discuss Digest, Vol 84, Issue 32 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From heiner.billich at psi.ch Fri Jan 25 09:13:53 2019 From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI)) Date: Fri, 25 Jan 2019 09:13:53 +0000 Subject: [gpfsug-discuss] [NFS-Ganesha-Support] does ganesha deny access for unknown UIDs? In-Reply-To: <35897363-6096-89e9-d22c-ba97ad10c26f@redhat.com> References: <44051794-8F45-4725-92E0-09729474E7A1@psi.ch> <35897363-6096-89e9-d22c-ba97ad10c26f@redhat.com> Message-ID: <1F7557E9-FE60-4F37-BA0A-FD4C37E124BD@psi.ch> Hello Daniel, thank you. The clients do NFS v3 mounts, hence idmap is no option - as I know it's used in NFS v4 to map between uid/guid and names only? For a process to switch to a certain uid/guid in general one does not need a matching passwd entry? I see that with ACLs you get issues as they use names, and you can't do a server-side group membership lookup, and there may be more subtle issues. Anyway, I'll create the needed accounts on the server. By the way: We had the same issue with Netapp filers and it took a while to find the configuration option to allow 'unknown' uid/gid to access a nfs v3 export. I'll try to reproduce on a test system with increased logging to see what exactly goes wrong and maybe ask later to add a configuration option to ganesha to switch to a behaviour more similar to kernel-nfs. Many client systems at my site are legacy and run various operating systems, hence a complete switch to NFS v4 is unlikely to happen soon. cheers, Heiner -- Paul Scherrer Institut Heiner Billich System Engineer Scientific Computing Science IT / High Performance Computing WHGA/106 Forschungsstrasse 111 5232 Villigen PSI Switzerland Phone +41 56 310 36 02 heiner.billich at psi.ch https://www.psi.ch ?On 24/01/19 16:35, "Daniel Gryniewicz" wrote: Hi. For local operating FSALs (like GPFS and VFS), the way Ganesha makes sure that a UID/GID combo has the correct permissions for an operation is to set the UID/GID of the thread to the one in the operation, then perform the actual operation. This way, the kernel and the underlying filesystem perform atomic permission checking on the op. This setuid/setgid will fail, of course, if the local system doesn't have that UID/GID to set to. The solution for this is to use NFS idmap to map the remote ID to a local one. This includes the ability to map unknown IDs to some local ID. Daniel On 1/24/19 9:29 AM, Billich Heinrich Rainer (PSI) wrote: > Hello, > > a local account on a nfs client couldn?t write to a ganesha nfs export > even with directory permissions 777. The solution was to create the > account on the ganesha servers, too. > > Please can you confirm that this is the intended behaviour? is there an > option to change this and to map unknown accounts to nobody instead? We > often have embedded Linux appliances or similar as nfs clients which > need to place some data on the nfs exports using uid/gid of local accounts. > > We manage gids on the server side and allow NFS v3 client access only. > > I crosspost this to ganesha support and to the gpfsug mailing list. > > Thank you, > > Heiner Billich > > ganesha version: 2.5.3-ibm028.00.el7.x86_64 From andy_kurth at ncsu.edu Fri Jan 25 16:08:12 2019 From: andy_kurth at ncsu.edu (Andy Kurth) Date: Fri, 25 Jan 2019 11:08:12 -0500 Subject: [gpfsug-discuss] does ganesha deny access for unknown UIDs? In-Reply-To: <44051794-8F45-4725-92E0-09729474E7A1@psi.ch> References: <44051794-8F45-4725-92E0-09729474E7A1@psi.ch> Message-ID: I believe this is occurring because of the manage_gids=TRUE setting. The purpose of this setting is to overcome the AUTH_SYS 16 group limit. If true, Ganesha takes the UID and resolves all of the GIDs on the server. If false, the GIDs sent by the client are used. I ran a quick test by creating a local user on the client and exporting 2 shares with 777 permissions, one with manage_gids=TRUE and one with FALSE. The user could view the share and create files with manage_gids=FALSE. ganesha.log showed that it tried and failed to resolve the UID to a name, but allowed the operation nonetheless: 2019-01-25 10:19:03 : epoch 0001004c : gpfs-proto.domain : ganesha.nfsd-123297[work-30] xdr_encode_nfs4_princ :ID MAPPER :INFO :nfs4_uid_to_name failed with code -2. 2019-01-25 10:19:03 : epoch 0001004c : gpfs-proto.domain : ganesha.nfsd-123297[work-30] xdr_encode_nfs4_princ :ID MAPPER :INFO :Lookup for 779 failed, using numeric owner With manage_gids=TRUE, the client received permission denied and ganesha.log showed the GID query failing: 2019-01-25 10:19:27 : epoch 0001004c : gpfs-proto.domain : ganesha.nfsd-123297[work-39] uid2grp_allocate_by_uid :ID MAPPER :INFO :No matching password record found for uid 779 2019-01-25 10:19:27 : epoch 0001004c : gpfs-proto.domain : ganesha.nfsd-123297[work-39] nfs_req_creds :DISP :INFO :Attempt to fetch managed_gids failed Hope this helps, Andy Kurth / NC State University On Thu, Jan 24, 2019 at 9:36 AM Billich Heinrich Rainer (PSI) < heiner.billich at psi.ch> wrote: > Hello, > > > > a local account on a nfs client couldn?t write to a ganesha nfs export > even with directory permissions 777. The solution was to create the account > on the ganesha servers, too. > > > > Please can you confirm that this is the intended behaviour? is there an > option to change this and to map unknown accounts to nobody instead? We > often have embedded Linux appliances or similar as nfs clients which need > to place some data on the nfs exports using uid/gid of local accounts. > > > > We manage gids on the server side and allow NFS v3 client access only. > > > > I crosspost this to ganesha support and to the gpfsug mailing list. > > > > Thank you, > > > > Heiner Billich > > > > ganesha version: 2.5.3-ibm028.00.el7.x86_64 > > > > the ganesha config > > > > CacheInode > > { > > fd_hwmark_percent=60; > > fd_lwmark_percent=20; > > fd_limit_percent=90; > > lru_run_interval=90; > > entries_hwmark=1500000; > > } > > NFS_Core_Param > > { > > clustered=TRUE; > > rpc_max_connections=10000; > > heartbeat_freq=0; > > mnt_port=33247; > > nb_worker=256; > > nfs_port=2049; > > nfs_protocols=3,4; > > nlm_port=33245; > > rquota_port=33246; > > rquota_port=33246; > > short_file_handle=FALSE; > > mount_path_pseudo=true; > > } > > GPFS > > { > > fsal_grace=FALSE; > > fsal_trace=TRUE; > > } > > NFSv4 > > { > > delegations=FALSE; > > domainname=virtual1.com; > > grace_period=60; > > lease_lifetime=60; > > } > > Export_Defaults > > { > > access_type=none; > > anonymous_gid=-2; > > anonymous_uid=-2; > > manage_gids=TRUE; > > nfs_commit=FALSE; > > privilegedport=FALSE; > > protocols=3,4; > > sectype=sys; > > squash=root_squash; > > transports=TCP; > > } > > > > one export > > > > # === START /**** id=206 nclients=3 === > > EXPORT { > > Attr_Expiration_Time=60; > > Delegations=none; > > Export_id=206; > > Filesystem_id=42.206; > > MaxOffsetRead=18446744073709551615; > > MaxOffsetWrite=18446744073709551615; > > MaxRead=1048576; > > MaxWrite=1048576; > > Path="/****"; > > PrefRead=1048576; > > PrefReaddir=1048576; > > PrefWrite=1048576; > > Pseudo="/****"; > > Tag="****"; > > UseCookieVerifier=false; > > FSAL { > > Name=GPFS; > > } > > CLIENT { > > # === ****/X12SA === > > Access_Type=RW; > > Anonymous_gid=-2; > > Anonymous_uid=-2; > > Clients=X.Y.A.B/24; > > Delegations=none; > > Manage_Gids=TRUE; > > NFS_Commit=FALSE; > > PrivilegedPort=FALSE; > > Protocols=3; > > SecType=SYS; > > Squash=Root; > > Transports=TCP; > > } > > ?. > > -- > > Paul Scherrer Institut > > Heiner Billich > > System Engineer Scientific Computing > > Science IT / High Performance Computing > > WHGA/106 > > Forschungsstrasse 111 > > 5232 Villigen PSI > > Switzerland > > > > Phone +41 56 310 36 02 > > heiner.billich at psi.ch > > https://www.psi.ch > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- *Andy Kurth* Research Storage Specialist NC State University Office of Information Technology P: 919-513-4090 311A Hillsborough Building Campus Box 7109 Raleigh, NC 27695 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Fri Jan 25 18:07:06 2019 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Fri, 25 Jan 2019 18:07:06 +0000 Subject: [gpfsug-discuss] FW: 'Flash (Alert): IBM Spectrum Scale (GPFS) V4.1.1.0 through 5.0.1.1: a read from or write to a DMAPI-migrated file may result in undetected data corruption or... Message-ID: [cid:forums.png] gpfs at us.ibm.com created a topic named Flash (Alert): IBM Spectrum Scale (GPFS) V4.1.1.0 through 5.0.1.1: a read from or write to a DMAPI-migrated file may result in undetected data corruption or a recall failure in the General Parallel File System - Announce (GPFS - Announce) forum. Abstract IBM has identified a problem in IBM Spectrum Scale V4.1.1.0 through 5.0.1.1, in which under some conditions reading a DMAPI-migrated file may return zeroes instead of the actual data. Further, a DMAPI-migrate operation or writing to a DMAPI-migrated file may cause the size of the stub file to be updated incorrectly, which may cause a mismatch between the file size recorded in the stub file and in the migrated object. This may result in failure of a manual or transparent recall, when triggered by a subsequent read from or write to the file. See the complete bulletin at: http://www.ibm.com/support/docview.wss?uid=ibm10741243 Open this item Posting Date: Friday, January 25, 2019 at 11:31:20 AM EST To unsubscribe or change settings, please go to your developerWorks community Settings. This is a notification sent from developerWorks community. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 289 bytes Desc: image001.png URL: From S.J.Thompson at bham.ac.uk Fri Jan 25 18:28:27 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 25 Jan 2019 18:28:27 +0000 Subject: [gpfsug-discuss] does ganesha deny access for unknown UIDs? In-Reply-To: References: <44051794-8F45-4725-92E0-09729474E7A1@psi.ch>, Message-ID: Note there are other limitations introduced by setting manage_gids. Whilst you get round the 16 group limit, instead ACLs are not properly interpreted to provide user access when an ACL is in place. In a PMR were told the only was around this would be to user sec_krb. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Andy Kurth [andy_kurth at ncsu.edu] Sent: 25 January 2019 16:08 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] does ganesha deny access for unknown UIDs? I believe this is occurring because of the manage_gids=TRUE setting. The purpose of this setting is to overcome the AUTH_SYS 16 group limit. If true, Ganesha takes the UID and resolves all of the GIDs on the server. If false, the GIDs sent by the client are used. I ran a quick test by creating a local user on the client and exporting 2 shares with 777 permissions, one with manage_gids=TRUE and one with FALSE. The user could view the share and create files with manage_gids=FALSE. ganesha.log showed that it tried and failed to resolve the UID to a name, but allowed the operation nonetheless: 2019-01-25 10:19:03 : epoch 0001004c : gpfs-proto.domain : ganesha.nfsd-123297[work-30] xdr_encode_nfs4_princ :ID MAPPER :INFO :nfs4_uid_to_name failed with code -2. 2019-01-25 10:19:03 : epoch 0001004c : gpfs-proto.domain : ganesha.nfsd-123297[work-30] xdr_encode_nfs4_princ :ID MAPPER :INFO :Lookup for 779 failed, using numeric owner With manage_gids=TRUE, the client received permission denied and ganesha.log showed the GID query failing: 2019-01-25 10:19:27 : epoch 0001004c : gpfs-proto.domain : ganesha.nfsd-123297[work-39] uid2grp_allocate_by_uid :ID MAPPER :INFO :No matching password record found for uid 779 2019-01-25 10:19:27 : epoch 0001004c : gpfs-proto.domain : ganesha.nfsd-123297[work-39] nfs_req_creds :DISP :INFO :Attempt to fetch managed_gids failed Hope this helps, Andy Kurth / NC State University On Thu, Jan 24, 2019 at 9:36 AM Billich Heinrich Rainer (PSI) > wrote: Hello, a local account on a nfs client couldn?t write to a ganesha nfs export even with directory permissions 777. The solution was to create the account on the ganesha servers, too. Please can you confirm that this is the intended behaviour? is there an option to change this and to map unknown accounts to nobody instead? We often have embedded Linux appliances or similar as nfs clients which need to place some data on the nfs exports using uid/gid of local accounts. We manage gids on the server side and allow NFS v3 client access only. I crosspost this to ganesha support and to the gpfsug mailing list. Thank you, Heiner Billich ganesha version: 2.5.3-ibm028.00.el7.x86_64 the ganesha config CacheInode { fd_hwmark_percent=60; fd_lwmark_percent=20; fd_limit_percent=90; lru_run_interval=90; entries_hwmark=1500000; } NFS_Core_Param { clustered=TRUE; rpc_max_connections=10000; heartbeat_freq=0; mnt_port=33247; nb_worker=256; nfs_port=2049; nfs_protocols=3,4; nlm_port=33245; rquota_port=33246; rquota_port=33246; short_file_handle=FALSE; mount_path_pseudo=true; } GPFS { fsal_grace=FALSE; fsal_trace=TRUE; } NFSv4 { delegations=FALSE; domainname=virtual1.com; grace_period=60; lease_lifetime=60; } Export_Defaults { access_type=none; anonymous_gid=-2; anonymous_uid=-2; manage_gids=TRUE; nfs_commit=FALSE; privilegedport=FALSE; protocols=3,4; sectype=sys; squash=root_squash; transports=TCP; } one export # === START /**** id=206 nclients=3 === EXPORT { Attr_Expiration_Time=60; Delegations=none; Export_id=206; Filesystem_id=42.206; MaxOffsetRead=18446744073709551615; MaxOffsetWrite=18446744073709551615; MaxRead=1048576; MaxWrite=1048576; Path="/****"; PrefRead=1048576; PrefReaddir=1048576; PrefWrite=1048576; Pseudo="/****"; Tag="****"; UseCookieVerifier=false; FSAL { Name=GPFS; } CLIENT { # === ****/X12SA === Access_Type=RW; Anonymous_gid=-2; Anonymous_uid=-2; Clients=X.Y.A.B/24; Delegations=none; Manage_Gids=TRUE; NFS_Commit=FALSE; PrivilegedPort=FALSE; Protocols=3; SecType=SYS; Squash=Root; Transports=TCP; } ?. -- Paul Scherrer Institut Heiner Billich System Engineer Scientific Computing Science IT / High Performance Computing WHGA/106 Forschungsstrasse 111 5232 Villigen PSI Switzerland Phone +41 56 310 36 02 heiner.billich at psi.ch https://www.psi.ch _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Andy Kurth Research Storage Specialist NC State University Office of Information Technology P: 919-513-4090 311A Hillsborough Building Campus Box 7109 Raleigh, NC 27695 From mnaineni at in.ibm.com Fri Jan 25 19:38:27 2019 From: mnaineni at in.ibm.com (Malahal R Naineni) Date: Fri, 25 Jan 2019 19:38:27 +0000 Subject: [gpfsug-discuss] does ganesha deny access for unknown UIDs? In-Reply-To: References: , <44051794-8F45-4725-92E0-09729474E7A1@psi.ch> Message-ID: An HTML attachment was scrubbed... URL: From chris.schlipalius at pawsey.org.au Sat Jan 26 01:32:59 2019 From: chris.schlipalius at pawsey.org.au (Chris Schlipalius) Date: Sat, 26 Jan 2019 09:32:59 +0800 Subject: [gpfsug-discuss] Announcing 2019 March 11th Singapore Spectrum Scale User Group event - call for user case speakers Message-ID: Hello, This is the announcement for the Spectrum Scale Usergroup Singapore on Monday 11th March 2019, Suntec Convention and Exhibition Centre, Singapore. This event is being held in conjunction with SCA19 https://sc-asia.org/ All current Singapore Spectrum Scale User Group event details can be found here: http://bit.ly/2FRur9d We are calling for user case speakers please ? let Ulf, Xiang or myself know if you are available to speak at this Usergroup. Feel free to circulate this event link to all who may need it. Please reserve your tickets now as tickets for places will close soon. There are some great speakers and topics, for details please see the agenda on Eventbrite. We are looking forwards to a great Usergroup in a fabulous venue. Thanks again to NSCC and IBM for helping to arrange the venue and event booking. Regards, Chris Schlipalius IBM Champion 2019 Team Lead, Storage Infrastructure, Data & Visualisation, The Pawsey Supercomputing Centre (CSIRO) 1 Bryce Avenue Kensington WA 6151 Australia Tel +61 8 6436 8815 Email chris.schlipalius at pawsey.org.au Web www.pawsey.org.au From Renar.Grunenberg at huk-coburg.de Mon Jan 28 08:36:45 2019 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Mon, 28 Jan 2019 08:36:45 +0000 Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays In-Reply-To: References: Message-ID: <528da43a668745f38d68c0a82ecb53a3@SMXRF105.msg.hukrf.de> Hallo Truong Vu, unfortunality the results are the same, the cmd-responce are not what we want. Ok, we want to analyze something with the trace facility and came to following link in the knowledge center: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/com.ibm.spectrum.scale.v5r02.doc/bl1ins_instracsupp.htm The docu mentioned that we must copy to windows files, tracefmt.exe and tracelog.exe, but the first one are only available in the DDK-Version 7.1 (W2K3), not in the WDK Version 8 or 10. We use W2K12. Can you clarify where I can find the mentioned files. Regards Renar. Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Truong Vu Gesendet: Donnerstag, 24. Januar 2019 19:18 An: gpfsug-discuss at spectrumscale.org Betreff: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Hi Renar, Let's see if it is really the /bin/rm is the problem here. Can you run the command again without cleanup the temp files as follow: DEBUG=1 keepTempFiles=1 mmgetstate -a Thanks, Tru. [Inactive hide details for gpfsug-discuss-request---01/23/2019 07:46:30 AM---Send gpfsug-discuss mailing list submissions to gp]gpfsug-discuss-request---01/23/2019 07:46:30 AM---Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 01/23/2019 07:46 AM Subject: gpfsug-discuss Digest, Vol 84, Issue 32 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: Spectrum Scale Cygwin cmd delays (Grunenberg, Renar) ---------------------------------------------------------------------- Message: 1 Date: Wed, 23 Jan 2019 12:45:39 +0000 From: "Grunenberg, Renar" > To: 'gpfsug main discussion list' > Subject: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Message-ID: <349cb338583a4c1d996677837fc65b6e at SMXRF105.msg.hukrf.de> Content-Type: text/plain; charset="utf-8" Hallo All, as a point to the problem, it seems to be that all the delayes are happening here DEBUG=1 mmgetstate ?a ??.. /bin/rm -f /var/mmfs/gen/mmsdrfs.1256 /var/mmfs/tmp/allClusterNodes.mmgetstate.1256 /var/mmfs/tmp/allQuorumNodes.mmgetstate.1256 /var/mmfs/tmp/allNonQuorumNodes.mmgetstate.1256 /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.pub /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.priv /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.cert /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.keystore /var/mmfs/tmp/nodefile.mmgetstate.1256 /var/mmfs/tmp/diskfile.mmgetstate.1256 /var/mmfs/tmp/diskNamesFile.mmgetstate.1256 Any points to this it will be fixed in the near future are welcome. Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: Grunenberg, Renar Gesendet: Dienstag, 22. Januar 2019 18:10 An: 'gpfsug main discussion list' > Betreff: AW: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Hallo Roger, first thanks fort he tip. But we decided to separate the linux-io-Cluster from the Windows client only cluster, because of security requirements and ssh management requirements. We can use at this point, local named admins on Windows and use on Linux a Deamon and an separated Admin-interface Network for pwless root ssh. Your Hint seems to be CCR related or is this a Cygwin problem. @Spectrum Scale Team: Point1: IP V6 can?t disabled because of applications that want to use this. But the mmcmi cmd are give us already the right ipv4 adresses. Point2. There are no DNS-Issues Point3: We must check these. Any recommendations to Rogers statements? Regards Renar Von: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Roger Moye Gesendet: Dienstag, 22. Januar 2019 16:43 An: gpfsug main discussion list > Betreff: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays We experienced the same issue and were advised not to use Windows for quorum nodes. We moved our Windows nodes into the storage cluster which was entirely Linux and that solved it. If this is not an option, perhaps adding some Linux nodes to your remote cluster as quorum nodes would help. -Roger From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of IBM Spectrum Scale Sent: Monday, January 21, 2019 5:35 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Hello Renar, A few things to try: 1. Make sure IPv6 is disabled. On each Windows node, run "mmcmi host ", with being itself and each and every node in the cluster. Make sure mmcmi prints valid IPv4 address. 1. To eliminate DNS issues, try adding IPv4 entries for each cluster node in "c:\windows\system32\drivers\etc\hosts". 1. If any anti-virus is active, disable realtime scanning on c:\cygwin64 (wherever you installed cygwin 64-bit). You can also try debugging a script, say: (from GPFS ksh): DEBUG=1 mmlscluster, and see what takes time. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Grunenberg, Renar" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 01/21/2019 08:01 AM Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello All, We test spectrum scale on an windows only Client-Cluster (remote mounted to a linux Cluster) but the execution of mm commands in cygwin is very slow. We have tried the following adjustments to increase the execution speed. * We have installed Cygwin Server as a service (cygserver-config). Unfortunately, this resulted in no faster execution. * Adaptation of the hosts file: 127.0.0.1localhost cygdrive wpad to prevent any DNS problems when accessing ?/cygdrive/...? * Started them as Administrator All adjustments have so far not led to any improvement. Are there any hints to enhance the cmd execution time on windows (w2k12 actual used) Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ----------------------------------------------------------------------------------- The information in this communication and any attachment is confidential and intended solely for the attention and use of the named addressee(s). All information and opinions expressed herein are subject to change without notice. This communication is not to be construed as an offer to sell or the solicitation of an offer to buy any security. Any such offer or solicitation can only be made by means of the delivery of a confidential private offering memorandum (which should be carefully reviewed for a complete description of investment strategies and risks). Any reliance one may place on the accuracy or validity of this information is at their own risk. Past performance is not necessarily indicative of the future results of an investment. All figures are estimated and unaudited unless otherwise noted. If you are not the intended recipient, or a person responsible for delivering this to the intended recipient, you are not authorized to and must not disclose, copy, distribute, o r retain this message or any part of it. In this case, please notify the sender immediately at 713-333-5440 -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 84, Issue 32 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 105 bytes Desc: image001.gif URL: From scale at us.ibm.com Tue Jan 29 00:20:47 2019 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Mon, 28 Jan 2019 16:20:47 -0800 Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays In-Reply-To: <528da43a668745f38d68c0a82ecb53a3@SMXRF105.msg.hukrf.de> References: <528da43a668745f38d68c0a82ecb53a3@SMXRF105.msg.hukrf.de> Message-ID: Hello Renar, I have WDK 8.1 installed and it does come with trace*.exe. Check this out: https://docs.microsoft.com/en-us/windows-hardware/drivers/devtest/tracefmt If not the WDK, did you try your SDK/VisualStudio folders as indicated in the above link? Nevertheless, I have uploaded trace*.exe here for you to download: ftp testcase.software.ibm.com. Login as anonymous and provide your email as password. cd /fromibm/aix. mget trace*.exe. This site gets scrubbed often, hence download soon before they get deleted. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Grunenberg, Renar" To: "gpfsug-discuss at spectrumscale.org" Date: 01/28/2019 12:38 AM Subject: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Sent by: gpfsug-discuss-bounces at spectrumscale.org Hallo Truong Vu, unfortunality the results are the same, the cmd-responce are not what we want. Ok, we want to analyze something with the trace facility and came to following link in the knowledge center: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/com.ibm.spectrum.scale.v5r02.doc/bl1ins_instracsupp.htm The docu mentioned that we must copy to windows files, tracefmt.exe and tracelog.exe, but the first one are only available in the DDK-Version 7.1 (W2K3), not in the WDK Version 8 or 10. We use W2K12. Can you clarify where I can find the mentioned files. Regards Renar. Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. Von: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Truong Vu Gesendet: Donnerstag, 24. Januar 2019 19:18 An: gpfsug-discuss at spectrumscale.org Betreff: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Hi Renar, Let's see if it is really the /bin/rm is the problem here. Can you run the command again without cleanup the temp files as follow: DEBUG=1 keepTempFiles=1 mmgetstate -a Thanks, Tru. gpfsug-discuss-request---01/23/2019 07:46:30 AM---Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 01/23/2019 07:46 AM Subject: gpfsug-discuss Digest, Vol 84, Issue 32 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: Spectrum Scale Cygwin cmd delays (Grunenberg, Renar) ---------------------------------------------------------------------- Message: 1 Date: Wed, 23 Jan 2019 12:45:39 +0000 From: "Grunenberg, Renar" To: 'gpfsug main discussion list' Subject: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Message-ID: <349cb338583a4c1d996677837fc65b6e at SMXRF105.msg.hukrf.de> Content-Type: text/plain; charset="utf-8" Hallo All, as a point to the problem, it seems to be that all the delayes are happening here DEBUG=1 mmgetstate ?a ??.. /bin/rm -f /var/mmfs/gen/mmsdrfs.1256 /var/mmfs/tmp/allClusterNodes.mmgetstate.1256 /var/mmfs/tmp/allQuorumNodes.mmgetstate.1256 /var/mmfs/tmp/allNonQuorumNodes.mmgetstate.1256 /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.pub /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.priv /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.cert /var/mmfs/ssl/stage/tmpKeyData.mmgetstate.1256.keystore /var/mmfs/tmp/nodefile.mmgetstate.1256 /var/mmfs/tmp/diskfile.mmgetstate.1256 /var/mmfs/tmp/diskNamesFile.mmgetstate.1256 Any points to this it will be fixed in the near future are welcome. Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ Von: Grunenberg, Renar Gesendet: Dienstag, 22. Januar 2019 18:10 An: 'gpfsug main discussion list' Betreff: AW: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Hallo Roger, first thanks fort he tip. But we decided to separate the linux-io-Cluster from the Windows client only cluster, because of security requirements and ssh management requirements. We can use at this point, local named admins on Windows and use on Linux a Deamon and an separated Admin-interface Network for pwless root ssh. Your Hint seems to be CCR related or is this a Cygwin problem. @Spectrum Scale Team: Point1: IP V6 can?t disabled because of applications that want to use this. But the mmcmi cmd are give us already the right ipv4 adresses. Point2. There are no DNS-Issues Point3: We must check these. Any recommendations to Rogers statements? Regards Renar Von: gpfsug-discuss-bounces at spectrumscale.org< mailto:gpfsug-discuss-bounces at spectrumscale.org> [ mailto:gpfsug-discuss-bounces at spectrumscale.org] Im Auftrag von Roger Moye Gesendet: Dienstag, 22. Januar 2019 16:43 An: gpfsug main discussion list > Betreff: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays We experienced the same issue and were advised not to use Windows for quorum nodes. We moved our Windows nodes into the storage cluster which was entirely Linux and that solved it. If this is not an option, perhaps adding some Linux nodes to your remote cluster as quorum nodes would help. -Roger From: gpfsug-discuss-bounces at spectrumscale.org< mailto:gpfsug-discuss-bounces at spectrumscale.org> [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of IBM Spectrum Scale Sent: Monday, January 21, 2019 5:35 PM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Hello Renar, A few things to try: 1. Make sure IPv6 is disabled. On each Windows node, run "mmcmi host ", with being itself and each and every node in the cluster. Make sure mmcmi prints valid IPv4 address. 1. To eliminate DNS issues, try adding IPv4 entries for each cluster node in "c:\windows\system32\drivers\etc\hosts". 1. If any anti-virus is active, disable realtime scanning on c:\cygwin64 (wherever you installed cygwin 64-bit). You can also try debugging a script, say: (from GPFS ksh): DEBUG=1 mmlscluster, and see what takes time. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Grunenberg, Renar" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 01/21/2019 08:01 AM Subject: [gpfsug-discuss] Spectrum Scale Cygwin cmd delays Sent by: gpfsug-discuss-bounces at spectrumscale.org< mailto:gpfsug-discuss-bounces at spectrumscale.org> ________________________________ Hello All, We test spectrum scale on an windows only Client-Cluster (remote mounted to a linux Cluster) but the execution of mm commands in cygwin is very slow. We have tried the following adjustments to increase the execution speed. * We have installed Cygwin Server as a service (cygserver-config). Unfortunately, this resulted in no faster execution. * Adaptation of the hosts file: 127.0.0.1localhost cygdrive wpad to prevent any DNS problems when accessing ?/cygdrive/...? * Started them as Administrator All adjustments have so far not led to any improvement. Are there any hints to enhance the cmd execution time on windows (w2k12 actual used) Regards Renar Renar Grunenberg Abteilung Informatik - Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ----------------------------------------------------------------------------------- The information in this communication and any attachment is confidential and intended solely for the attention and use of the named addressee(s). All information and opinions expressed herein are subject to change without notice. This communication is not to be construed as an offer to sell or the solicitation of an offer to buy any security. Any such offer or solicitation can only be made by means of the delivery of a confidential private offering memorandum (which should be carefully reviewed for a complete description of investment strategies and risks). Any reliance one may place on the accuracy or validity of this information is at their own risk. Past performance is not necessarily indicative of the future results of an investment. All figures are estimated and unaudited unless otherwise noted. If you are not the intended recipient, or a person responsible for delivering this to the intended recipient, you are not authorized to and must not disclose, copy, distribute, o r retain this message or any part of it. In this case, please notify the sender immediately at 713-333-5440 -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20190123/eff7ad74/attachment.html > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 84, Issue 32 ********************************************** _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=_PEp_I-F3uzCglEj5raDY1xo2-W6myUCIX1ysChh0lo&s=k9JU3wc7KoJj1VWVVSjjAekQcIEfeJazMkT3BBME-SY&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 105 bytes Desc: not available URL: From cblack at nygenome.org Tue Jan 29 17:23:49 2019 From: cblack at nygenome.org (Christopher Black) Date: Tue, 29 Jan 2019 17:23:49 +0000 Subject: [gpfsug-discuss] Querying size of snapshots Message-ID: <51DD7A04-B0EE-4FDE-8FF6-E888A2C51E23@nygenome.org> We have some large filesets (PB+) and filesystems where I would like to monitor delete rates and estimate how much space we will get back as snapshots expire. We only keep 3-4 daily snapshots on this filesystem due to churn. I?ve tried to query the sizes of snapshots using the following command: mmlssnapshot fsname -d --block-size 1T However, this has run for over an hour without producing any results. Metadata is all on flash and I?m not sure why this is taking so long. Does anyone have any insight on this or alternate methods for getting estimates of snapshot sizes? Best, Chris PS I am aware of the warning in docs about the -d option. ________________________________ This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Tue Jan 29 18:24:17 2019 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 29 Jan 2019 15:24:17 -0300 Subject: [gpfsug-discuss] Querying size of snapshots In-Reply-To: <51DD7A04-B0EE-4FDE-8FF6-E888A2C51E23@nygenome.org> References: <51DD7A04-B0EE-4FDE-8FF6-E888A2C51E23@nygenome.org> Message-ID: 1. First off, let's RTFM ... -d Displays the amount of storage that is used by the snapshot. This operation requires an amount of time that is proportional to the size of the file system; therefore, it can take several minutes or even hours on a large and heavily-loaded file system. This optional parameter can impact overall system performance. Avoid running the mmlssnapshot command with this parameter frequently or during periods of high file system activity. SOOOO.. there's that. 2. Next you may ask, HOW is that? Snapshots are maintained with a "COW" strategy -- They are created quickly, essentially just making a record that the snapshot was created and at such and such time -- when the snapshot is the same as the "live" filesystem... Then over time, each change to a block of data in live system requires that a copy is made of the old data block and that is associated with the most recently created snapshot.... SO, as more and more changes are made to different blocks over time the snapshot becomes bigger and bigger. How big? Well it seems the current implementation does not keep a "simple counter" of the number of blocks -- but rather, a list of the blocks that were COW'ed.... So when you come and ask "How big"... GPFS has to go traverse the file sytem metadata and count those COW'ed blocks.... 3. So why not keep a counter? Well, it's likely not so simple. For starters GPFS is typically running concurrently on several or many nodes... And probably was not deemed worth the effort ..... IF a convincing case could be made, I'd bet there is a way... to at least keep approximate numbers, log records, exact updates periodically, etc, etc -- similar to the way space allocation and accounting is done for the live file system... -------------- next part -------------- An HTML attachment was scrubbed... URL: From cblack at nygenome.org Tue Jan 29 18:43:24 2019 From: cblack at nygenome.org (Christopher Black) Date: Tue, 29 Jan 2019 18:43:24 +0000 Subject: [gpfsug-discuss] Querying size of snapshots In-Reply-To: References: <51DD7A04-B0EE-4FDE-8FF6-E888A2C51E23@nygenome.org> Message-ID: <369E2850-0335-468D-9F86-B18D4576A04C@nygenome.org> Thanks for the quick and detailed reply! I had read the manual and was aware of the warnings about -d (mentioned in my PS). On systems with high churn (lots of temporary files, lots of big and small deletes along with many new files), I?ve previously used estimates of snapshot size as a useful signal on whether we can expect to see an increase in available space over the next few days as snapshots expire. I?ve used this technique on a few different more mainstream storage systems, but never on gpfs. I?d find it useful to have a similar way to monitor ?space to be freed pending snapshot deletes? on gpfs. It sounds like there is not an existing solution for this so it would be a request for enhancement. I?m not sure how much overhead there would be keeping a running counter for blocks changed since snapshot creation or if that would completely fall apart on large systems or systems with many snapshots. If that is a consideration even having only an estimate for the oldest snapshot would be useful, but I realize that can depend on all the other later snapshots as well. Perhaps an overall ?size of all snapshots? would be easier to manage and would still be useful to us. I don?t need this number to be 100% accurate, but a low or floor estimate would be very useful. Is anyone else interested in this? Do other people have other ways to estimate how much space they will get back as snapshots expire? Is there a more efficient way of making such an estimate available to admins other than running an mmlssnapshot -d every night and recording the output? Thanks all! Chris From: on behalf of Marc A Kaplan Reply-To: gpfsug main discussion list Date: Tuesday, January 29, 2019 at 1:24 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Querying size of snapshots 1. First off, let's RTFM ... -d Displays the amount of storage that is used by the snapshot. This operation requires an amount of time that is proportional to the size of the file system; therefore, it can take several minutes or even hours on a large and heavily-loaded file system. This optional parameter can impact overall system performance. Avoid running the mmlssnapshot command with this parameter frequently or during periods of high file system activity. SOOOO.. there's that. 2. Next you may ask, HOW is that? Snapshots are maintained with a "COW" strategy -- They are created quickly, essentially just making a record that the snapshot was created and at such and such time -- when the snapshot is the same as the "live" filesystem... Then over time, each change to a block of data in live system requires that a copy is made of the old data block and that is associated with the most recently created snapshot.... SO, as more and more changes are made to different blocks over time the snapshot becomes bigger and bigger. How big? Well it seems the current implementation does not keep a "simple counter" of the number of blocks -- but rather, a list of the blocks that were COW'ed.... So when you come and ask "How big"... GPFS has to go traverse the file sytem metadata and count those COW'ed blocks.... 3. So why not keep a counter? Well, it's likely not so simple. For starters GPFS is typically running concurrently on several or many nodes... And probably was not deemed worth the effort ..... IF a convincing case could be made, I'd bet there is a way... to at least keep approximate numbers, log records, exact updates periodically, etc, etc -- similar to the way space allocation and accounting is done for the live file system... ________________________________ This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Tue Jan 29 19:19:12 2019 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Tue, 29 Jan 2019 20:19:12 +0100 Subject: [gpfsug-discuss] Querying size of snapshots In-Reply-To: <369E2850-0335-468D-9F86-B18D4576A04C@nygenome.org> References: <51DD7A04-B0EE-4FDE-8FF6-E888A2C51E23@nygenome.org> <369E2850-0335-468D-9F86-B18D4576A04C@nygenome.org> Message-ID: You could put snapshot data in a separate storage pool. Then it should be visible how much space it occupies, but it?s a bit hard to see how this will be usable/manageable.. -jf tir. 29. jan. 2019 kl. 20:08 skrev Christopher Black : > Thanks for the quick and detailed reply! I had read the manual and was > aware of the warnings about -d (mentioned in my PS). > > On systems with high churn (lots of temporary files, lots of big and small > deletes along with many new files), I?ve previously used estimates of > snapshot size as a useful signal on whether we can expect to see an > increase in available space over the next few days as snapshots expire. > I?ve used this technique on a few different more mainstream storage > systems, but never on gpfs. > > I?d find it useful to have a similar way to monitor ?space to be freed > pending snapshot deletes? on gpfs. It sounds like there is not an existing > solution for this so it would be a request for enhancement. > > I?m not sure how much overhead there would be keeping a running counter > for blocks changed since snapshot creation or if that would completely fall > apart on large systems or systems with many snapshots. If that is a > consideration even having only an estimate for the oldest snapshot would be > useful, but I realize that can depend on all the other later snapshots as > well. Perhaps an overall ?size of all snapshots? would be easier to manage > and would still be useful to us. > > I don?t need this number to be 100% accurate, but a low or floor estimate > would be very useful. > > > > Is anyone else interested in this? Do other people have other ways to > estimate how much space they will get back as snapshots expire? Is there a > more efficient way of making such an estimate available to admins other > than running an mmlssnapshot -d every night and recording the output? > > > > Thanks all! > > Chris > > > > *From: * on behalf of Marc A > Kaplan > *Reply-To: *gpfsug main discussion list > *Date: *Tuesday, January 29, 2019 at 1:24 PM > *To: *gpfsug main discussion list > *Subject: *Re: [gpfsug-discuss] Querying size of snapshots > > > > 1. First off, let's RTFM ... > > *-d *Displays the amount of storage that is used by the snapshot. > This operation requires an amount of time that is proportional to the size > of the file system; therefore, > it can take several minutes or even hours on a large and heavily-loaded > file system. > This optional parameter can impact overall system performance. Avoid > running the * mmlssnapshot* > command with this parameter frequently or during periods of high file > system activity. > > SOOOO.. there's that. > > 2. Next you may ask, HOW is that? > > Snapshots are maintained with a "COW" strategy -- They are created > quickly, essentially just making a record that the snapshot was created and > at such and such time -- when the snapshot is the same as the "live" > filesystem... > > Then over time, each change to a block of data in live system requires > that a copy is made of the old data block and that is associated with the > most recently created snapshot.... SO, as more and more changes are made > to different blocks over time the snapshot becomes bigger and bigger. How > big? Well it seems the current implementation does not keep a "simple > counter" of the number of blocks -- but rather, a list of the blocks that > were COW'ed.... So when you come and ask "How big"... GPFS has to go > traverse the file sytem metadata and count those COW'ed blocks.... > > 3. So why not keep a counter? Well, it's likely not so simple. For > starters GPFS is typically running concurrently on several or many > nodes... And probably was not deemed worth the effort ..... IF a > convincing case could be made, I'd bet there is a way... to at least keep > approximate numbers, log records, exact updates periodically, etc, etc -- > similar to the way space allocation and accounting is done for the live > file system... > > > ------------------------------ > This message is for the recipient?s use only, and may contain > confidential, privileged or protected information. Any unauthorized use or > dissemination of this communication is prohibited. If you received this > message in error, please immediately notify the sender and destroy all > copies of this message. The recipient should check this email and any > attachments for the presence of viruses, as we accept no liability for any > damage caused by any virus transmitted by this email. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Tue Jan 29 21:37:08 2019 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Tue, 29 Jan 2019 22:37:08 +0100 Subject: [gpfsug-discuss] Querying size of snapshots In-Reply-To: References: <51DD7A04-B0EE-4FDE-8FF6-E888A2C51E23@nygenome.org><369E2850-0335-468D-9F86-B18D4576A04C@nygenome.org> Message-ID: An HTML attachment was scrubbed... URL: From alvise.dorigo at psi.ch Wed Jan 30 13:16:22 2019 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Wed, 30 Jan 2019 13:16:22 +0000 Subject: [gpfsug-discuss] Unbalanced pdisk free space Message-ID: <83A6EEB0EC738F459A39439733AE8045267DF159@MBX114.d.ethz.ch> Hello, I've a Lenovo Spectrum Scale system DSS-G220 (software dss-g-2.0a) composed of 2x x3560 M5 IO server nodes 1x x3550 M5 client/support node 2x disk enclosures D3284 GPFS/GNR 4.2.3-7 Can anybody tell me if it is normal that all the pdisks of both my recovery groups, residing on the same physical enclosure have free space equal to (more or less) 1/3 of the free space of the pdisks residing on the other physical enclosure (see attached text files for the command line output) ? I guess when the least free disks are fully occupied (while the others are still partially free) write performance will drop by a factor of two. Correct ? Is there a way (considering that the system is in production) to fix (rebalance) this free space among all pdisk of both enclosures ? Should I open a PMR to IBM ? Many thanks, Alvise -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: rg1 Type: application/octet-stream Size: 13340 bytes Desc: rg1 URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: rg2 Type: application/octet-stream Size: 13340 bytes Desc: rg2 URL: From abeattie at au1.ibm.com Wed Jan 30 14:53:47 2019 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Wed, 30 Jan 2019 14:53:47 +0000 Subject: [gpfsug-discuss] Unbalanced pdisk free space In-Reply-To: <83A6EEB0EC738F459A39439733AE8045267DF159@MBX114.d.ethz.ch> References: <83A6EEB0EC738F459A39439733AE8045267DF159@MBX114.d.ethz.ch> Message-ID: An HTML attachment was scrubbed... URL: From scale at us.ibm.com Wed Jan 30 20:25:20 2019 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Wed, 30 Jan 2019 15:25:20 -0500 Subject: [gpfsug-discuss] Unbalanced pdisk free space In-Reply-To: <83A6EEB0EC738F459A39439733AE8045267DF159@MBX114.d.ethz.ch> References: <83A6EEB0EC738F459A39439733AE8045267DF159@MBX114.d.ethz.ch> Message-ID: Alvise, Could you send us the output of the following commands from both server nodes. mmfsadm dump nspdclient > /tmp/dump_nspdclient. mmfsadm dump pdisk > /tmp/dump_pdisk. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Dorigo Alvise (PSI)" To: "gpfsug-discuss at spectrumscale.org" Date: 01/30/2019 08:24 AM Subject: [gpfsug-discuss] Unbalanced pdisk free space Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello, I've a Lenovo Spectrum Scale system DSS-G220 (software dss-g-2.0a) composed of 2x x3560 M5 IO server nodes 1x x3550 M5 client/support node 2x disk enclosures D3284 GPFS/GNR 4.2.3-7 Can anybody tell me if it is normal that all the pdisks of both my recovery groups, residing on the same physical enclosure have free space equal to (more or less) 1/3 of the free space of the pdisks residing on the other physical enclosure (see attached text files for the command line output) ? I guess when the least free disks are fully occupied (while the others are still partially free) write performance will drop by a factor of two. Correct ? Is there a way (considering that the system is in production) to fix (rebalance) this free space among all pdisk of both enclosures ? Should I open a PMR to IBM ? Many thanks, Alvise [attachment "rg1" deleted by Brian Herr/Poughkeepsie/IBM] [attachment "rg2" deleted by Brian Herr/Poughkeepsie/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=QDZ-afehEgpYi3JGRd8q6rHgo4rb8gVu_VKQwg4MwEs&s=5bEFHRU7zk-nRK_d20vJBngQOOkSLWT1vvtcDNKD584&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Jan 30 20:51:49 2019 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 30 Jan 2019 20:51:49 +0000 Subject: [gpfsug-discuss] =?utf-8?q?Node_=E2=80=98crash_and_restart?= =?utf-8?q?=E2=80=99_event_using_GPFS_callback=3F?= Message-ID: <7F7F8D51-256D-4EA7-B03F-11CF7B752AFA@nuance.com> Anyone crafted a good way to detect a node ?crash and restart? event using GPFS callbacks? I?m thinking ?preShutdown? but I?m not sure if that?s the best. What I?m really looking for is did the node shutdown (abort) and create a dump in /tmp/mmfs Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From Paul.Sanchez at deshaw.com Wed Jan 30 21:02:26 2019 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Wed, 30 Jan 2019 21:02:26 +0000 Subject: [gpfsug-discuss] =?utf-8?q?Node_=E2=80=98crash_and_restart?= =?utf-8?q?=E2=80=99_event_using_GPFS_callback=3F?= In-Reply-To: <7F7F8D51-256D-4EA7-B03F-11CF7B752AFA@nuance.com> References: <7F7F8D51-256D-4EA7-B03F-11CF7B752AFA@nuance.com> Message-ID: <9892d4f382cb4fbe80dbde9f87724632@mbxtoa1.winmail.deshaw.com> There are some cases which I don?t believe can be caught with callbacks (e.g. DMS = Dead Man Switch). But you could possibly use preStartup to check the host uptime to make an assumption if GPFS was restarted long after the host booted. You could also peek in /tmp/mmfs and only report if you find something there. That said, the docs say that preStartup fires after the node joins the cluster. So if that means once the node is ?active? then you might miss out on nodes stuck in ?arbitrating? for a while due to a waiter problem. We run a script with cron which monitors the myriad things which can go wrong and attempt to right those which are safe to fix, and raise alerts appropriately. Something like that, outside the reach of GPFS, is often a good choice if you don?t need to know something the moment it happens. Thx Paul From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Oesterlin, Robert Sent: Wednesday, January 30, 2019 3:52 PM To: gpfsug main discussion list Subject: [gpfsug-discuss] Node ?crash and restart? event using GPFS callback? Anyone crafted a good way to detect a node ?crash and restart? event using GPFS callbacks? I?m thinking ?preShutdown? but I?m not sure if that?s the best. What I?m really looking for is did the node shutdown (abort) and create a dump in /tmp/mmfs Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Wed Jan 30 21:16:51 2019 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Wed, 30 Jan 2019 18:16:51 -0300 Subject: [gpfsug-discuss] =?utf-8?q?Node_=E2=80=98crash_and_restart?= =?utf-8?q?=E2=80=99_event_using_GPFS_callback=3F?= In-Reply-To: <7F7F8D51-256D-4EA7-B03F-11CF7B752AFA@nuance.com> References: <7F7F8D51-256D-4EA7-B03F-11CF7B752AFA@nuance.com> Message-ID: We have (pre)shutdown and pre(startup) ... Trap and record both... If you see a startup without a matching shutdown you know the shutdown never happened, because GPFS crashed. From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 01/30/2019 05:52 PM Subject: [gpfsug-discuss] Node ?crash and restart? event using GPFS callback? Sent by: gpfsug-discuss-bounces at spectrumscale.org Anyone crafted a good way to detect a node ?crash and restart? event using GPFS callbacks? I?m thinking ?preShutdown? but I?m not sure if that?s the best. What I?m really looking for is did the node shutdown (abort) and create a dump in /tmp/mmfs Bob Oesterlin Sr Principal Storage Engineer, Nuance _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8&m=oBQHDWo5PVKthJjmbVrQyqSrkuFZEcMQb_tXtvcKepE&s=HfF_wArTvc-i4wLfATXbwrImRT-w0mKG8mhctBJFLCI&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From Dwayne.Hart at med.mun.ca Wed Jan 30 21:52:48 2019 From: Dwayne.Hart at med.mun.ca (Dwayne.Hart at med.mun.ca) Date: Wed, 30 Jan 2019 21:52:48 +0000 Subject: [gpfsug-discuss] =?windows-1252?q?Node_=91crash_and_restart=92_ev?= =?windows-1252?q?ent_using_GPFS_callback=3F?= In-Reply-To: <9892d4f382cb4fbe80dbde9f87724632@mbxtoa1.winmail.deshaw.com> References: <7F7F8D51-256D-4EA7-B03F-11CF7B752AFA@nuance.com>, <9892d4f382cb4fbe80dbde9f87724632@mbxtoa1.winmail.deshaw.com> Message-ID: <063B3F21-8695-4454-8D1A-B1734B1AD436@med.mun.ca> Could you get away with running ?mmdiag ?stats? and inspecting the uptime information it provides? Best, Dwayne ? Dwayne Hart | Systems Administrator IV CHIA, Faculty of Medicine Memorial University of Newfoundland 300 Prince Philip Drive St. John?s, Newfoundland | A1B 3V6 Craig L Dobbin Building | 4M409 T 709 864 6631 On Jan 30, 2019, at 5:32 PM, Sanchez, Paul > wrote: There are some cases which I don?t believe can be caught with callbacks (e.g. DMS = Dead Man Switch). But you could possibly use preStartup to check the host uptime to make an assumption if GPFS was restarted long after the host booted. You could also peek in /tmp/mmfs and only report if you find something there. That said, the docs say that preStartup fires after the node joins the cluster. So if that means once the node is ?active? then you might miss out on nodes stuck in ?arbitrating? for a while due to a waiter problem. We run a script with cron which monitors the myriad things which can go wrong and attempt to right those which are safe to fix, and raise alerts appropriately. Something like that, outside the reach of GPFS, is often a good choice if you don?t need to know something the moment it happens. Thx Paul From: gpfsug-discuss-bounces at spectrumscale.org > On Behalf Of Oesterlin, Robert Sent: Wednesday, January 30, 2019 3:52 PM To: gpfsug main discussion list > Subject: [gpfsug-discuss] Node ?crash and restart? event using GPFS callback? Anyone crafted a good way to detect a node ?crash and restart? event using GPFS callbacks? I?m thinking ?preShutdown? but I?m not sure if that?s the best. What I?m really looking for is did the node shutdown (abort) and create a dump in /tmp/mmfs Bob Oesterlin Sr Principal Storage Engineer, Nuance _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu Jan 31 01:19:47 2019 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 31 Jan 2019 01:19:47 +0000 Subject: [gpfsug-discuss] =?utf-8?q?Node_=E2=80=98crash_and_restart?= =?utf-8?q?=E2=80=99_event_using_GPFS_callback=3F?= Message-ID: <554E186D-30BD-4E7D-859C-339F5DDAD442@nuance.com> Actually, I think ?preShutdown? will do it since it passes the type of shutdown ?abnormal? for a crash to the call back - I can use that to send a Slack message. mmaddcallback node-abort --event preShutdown --command /usr/local/sbin/callback-test.sh --parms "%eventName %reason" and you get either: preShutdown normal preShutdown abnormal Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Marc A Kaplan Reply-To: gpfsug main discussion list Date: Wednesday, January 30, 2019 at 3:17 PM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Node ?crash and restart? event using GPFS callback? We have (pre)shutdown and pre(startup) ... Trap and record both... If you see a startup without a matching shutdown you know the shutdown never happened, because GPFS crashed. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alastair.smith at ucl.ac.uk Wed Jan 30 14:11:08 2019 From: alastair.smith at ucl.ac.uk (Smith, Alastair) Date: Wed, 30 Jan 2019 14:11:08 +0000 Subject: [gpfsug-discuss] Job opportunity at UCL Research Data Services Message-ID: Dear List Members, We would like to draw you attention to a job opportunity at UCL for a Senior Research Data Systems Engineer. The is a technical role in the Research Data Services Group, part of UCL's large and well-established Research IT Services team. The Senior Data Systems Engineer leads the development of technical strategy for Research Data Services at UCL. The successful applicant will ensure that appropriate technologies and workflows are used to address research data management requirements across the institution, particularly those relating to data storage and access. The Research Data Services Group provides petabyte-scale data storage for active research projects, and is about to launch a long-term data repository service. Over the coming years, the Group will be building an integrated suite of services to support data management from planning to re-use, and the successful candidate will play an important role in the design and operation of these services. The post comes with a competitive salary and a central London working location. The closing date for applications it 2nd February. Further particulars and a link to the application form are available from https://tinyurl.com/ucljobs-rdse. -|-|-|-|-|-|-|-|-|-|-|-|-|- Dr Alastair Smith Senior research data systems engineer Research Data Services, RITS Information Services Division University College London 1 St Martin's- Le-Grand London EC1A 4AS -------------- next part -------------- An HTML attachment was scrubbed... URL: From alvise.dorigo at psi.ch Thu Jan 31 09:48:12 2019 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Thu, 31 Jan 2019 09:48:12 +0000 Subject: [gpfsug-discuss] Unbalanced pdisk free space In-Reply-To: References: <83A6EEB0EC738F459A39439733AE8045267DF159@MBX114.d.ethz.ch>, Message-ID: <83A6EEB0EC738F459A39439733AE8045267E32C0@MBX114.d.ethz.ch> They're attached. Thanks! Alvise ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of IBM Spectrum Scale [scale at us.ibm.com] Sent: Wednesday, January 30, 2019 9:25 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Unbalanced pdisk free space Alvise, Could you send us the output of the following commands from both server nodes. * mmfsadm dump nspdclient > /tmp/dump_nspdclient. * mmfsadm dump pdisk > /tmp/dump_pdisk. * Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Dorigo Alvise (PSI)" To: "gpfsug-discuss at spectrumscale.org" Date: 01/30/2019 08:24 AM Subject: [gpfsug-discuss] Unbalanced pdisk free space Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello, I've a Lenovo Spectrum Scale system DSS-G220 (software dss-g-2.0a) composed of 2x x3560 M5 IO server nodes 1x x3550 M5 client/support node 2x disk enclosures D3284 GPFS/GNR 4.2.3-7 Can anybody tell me if it is normal that all the pdisks of both my recovery groups, residing on the same physical enclosure have free space equal to (more or less) 1/3 of the free space of the pdisks residing on the other physical enclosure (see attached text files for the command line output) ? I guess when the least free disks are fully occupied (while the others are still partially free) write performance will drop by a factor of two. Correct ? Is there a way (considering that the system is in production) to fix (rebalance) this free space among all pdisk of both enclosures ? Should I open a PMR to IBM ? Many thanks, Alvise [attachment "rg1" deleted by Brian Herr/Poughkeepsie/IBM] [attachment "rg2" deleted by Brian Herr/Poughkeepsie/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: dump_nspdclient.sf-dssio-1 Type: application/octet-stream Size: 570473 bytes Desc: dump_nspdclient.sf-dssio-1 URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: dump_nspdclient.sf-dssio-2 Type: application/octet-stream Size: 566924 bytes Desc: dump_nspdclient.sf-dssio-2 URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: dump_pdisk.sf-dssio-1 Type: application/octet-stream Size: 682312 bytes Desc: dump_pdisk.sf-dssio-1 URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: dump_pdisk.sf-dssio-2 Type: application/octet-stream Size: 619497 bytes Desc: dump_pdisk.sf-dssio-2 URL: From heiner.billich at psi.ch Thu Jan 31 14:56:21 2019 From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI)) Date: Thu, 31 Jan 2019 14:56:21 +0000 Subject: [gpfsug-discuss] Token manager - how to monitor performance? Message-ID: <02FE0AE6-BDDC-4E10-9C41-E68EB91758AA@psi.ch> Hello, Sorry for coming up with this never-ending story. I know that token management is mainly autoconfigured and even the placement of token manager nodes is no longer under user control in all cases. Still I would like to monitor this component to see if we are close to some limit like memory or rpc rate. Especially as we?ll do some major changes to our setup soon. I would like to monitor the performance of our token manager nodes to get warned _before_ we get performance issues. Any advice is welcome. Ideally I would like collect some numbers and pass them on to influxdb or similar. I didn?t find anything in perfmon/zimon that seemed to match. I could imagine that numbers like ?number of active tokens? and ?number of token operations? per manager would be helpful. Or ?# of rpc calls per second?. And maybe ?number of open files?, ?number of token operations?, ?number of tokens? for clients. And maybe some percentage of used token memory ? and cache hit ratio ? This would also help to tune ? like if a client does very many token operations or rpc calls maybe I should increase maxFilesToCache. The above is just to illustrate, as token management is complicated the really valuable metrics may be different. Or am I too anxious and should wait and see instead? cheers, Heiner Heiner Billich -- Paul Scherrer Institut Heiner Billich System Engineer Scientific Computing Science IT / High Performance Computing WHGA/106 Forschungsstrasse 111 5232 Villigen PSI Switzerland Phone +41 56 310 36 02 heiner.billich at psi.ch https://www.psi.ch -------------- next part -------------- An HTML attachment was scrubbed... URL: From TOMP at il.ibm.com Thu Jan 31 15:11:24 2019 From: TOMP at il.ibm.com (Tomer Perry) Date: Thu, 31 Jan 2019 17:11:24 +0200 Subject: [gpfsug-discuss] Token manager - how to monitor performance? In-Reply-To: <02FE0AE6-BDDC-4E10-9C41-E68EB91758AA@psi.ch> References: <02FE0AE6-BDDC-4E10-9C41-E68EB91758AA@psi.ch> Message-ID: Hi, I agree that we should potentially add mode metrics, but for a start, I would look into mmdiag --memory and mmdiag --tokenmgr (the latter show different output on a token server). Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: "Billich Heinrich Rainer (PSI)" To: gpfsug main discussion list Date: 31/01/2019 16:56 Subject: [gpfsug-discuss] Token manager - how to monitor performance? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello, Sorry for coming up with this never-ending story. I know that token management is mainly autoconfigured and even the placement of token manager nodes is no longer under user control in all cases. Still I would like to monitor this component to see if we are close to some limit like memory or rpc rate. Especially as we?ll do some major changes to our setup soon. I would like to monitor the performance of our token manager nodes to get warned _before_ we get performance issues. Any advice is welcome. Ideally I would like collect some numbers and pass them on to influxdb or similar. I didn?t find anything in perfmon/zimon that seemed to match. I could imagine that numbers like ?number of active tokens? and ?number of token operations? per manager would be helpful. Or ?# of rpc calls per second?. And maybe ?number of open files?, ?number of token operations?, ?number of tokens? for clients. And maybe some percentage of used token memory ? and cache hit ratio ? This would also help to tune ? like if a client does very many token operations or rpc calls maybe I should increase maxFilesToCache. The above is just to illustrate, as token management is complicated the really valuable metrics may be different. Or am I too anxious and should wait and see instead? cheers, Heiner Heiner Billich -- Paul Scherrer Institut Heiner Billich System Engineer Scientific Computing Science IT / High Performance Computing WHGA/106 Forschungsstrasse 111 5232 Villigen PSI Switzerland Phone +41 56 310 36 02 heiner.billich at psi.ch https://www.psi.ch _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=mLPyKeOa1gNDrORvEXBgMw&m=J5n3Wsk1f6CsyL867jkmS3P2BYZDfkPS6GB9dShnYcI&s=YFTWUM3MQu8C1MitRnyPnYQ_wMtjj3Uwmif6gJUoLgc&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Wed Jan 30 21:15:48 2019 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 30 Jan 2019 21:15:48 +0000 Subject: [gpfsug-discuss] =?utf-8?q?Node_=E2=80=98crash_and_restart?= =?utf-8?q?=E2=80=99_event_using_GPFS_callback=3F?= In-Reply-To: <9892d4f382cb4fbe80dbde9f87724632@mbxtoa1.winmail.deshaw.com> References: <7F7F8D51-256D-4EA7-B03F-11CF7B752AFA@nuance.com> <9892d4f382cb4fbe80dbde9f87724632@mbxtoa1.winmail.deshaw.com> Message-ID: Hi Bob, We use the nodeLeave callback to detect node expels ? for what you?re wanting to do I wonder if nodeJoin might work?? If a node joins the cluster and then has an uptime of a few minutes you could go looking in /tmp/mmfs. HTH... -- Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Jan 30, 2019, at 3:02 PM, Sanchez, Paul > wrote: There are some cases which I don?t believe can be caught with callbacks (e.g. DMS = Dead Man Switch). But you could possibly use preStartup to check the host uptime to make an assumption if GPFS was restarted long after the host booted. You could also peek in /tmp/mmfs and only report if you find something there. That said, the docs say that preStartup fires after the node joins the cluster. So if that means once the node is ?active? then you might miss out on nodes stuck in ?arbitrating? for a while due to a waiter problem. We run a script with cron which monitors the myriad things which can go wrong and attempt to right those which are safe to fix, and raise alerts appropriately. Something like that, outside the reach of GPFS, is often a good choice if you don?t need to know something the moment it happens. Thx Paul From: gpfsug-discuss-bounces at spectrumscale.org > On Behalf Of Oesterlin, Robert Sent: Wednesday, January 30, 2019 3:52 PM To: gpfsug main discussion list > Subject: [gpfsug-discuss] Node ?crash and restart? event using GPFS callback? Anyone crafted a good way to detect a node ?crash and restart? event using GPFS callbacks? I?m thinking ?preShutdown? but I?m not sure if that?s the best. What I?m really looking for is did the node shutdown (abort) and create a dump in /tmp/mmfs Bob Oesterlin Sr Principal Storage Engineer, Nuance _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cccd012a939124326a53908d686f64117%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636844789557921185&sdata=9bMPd%2F%2B%2Babt6IdeFYcdznPBQwPrMLFsXHTBYISlyYGM%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Jan 31 15:40:50 2019 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 31 Jan 2019 12:40:50 -0300 Subject: [gpfsug-discuss] =?utf-8?q?Node_=E2=80=98crash_and_restart?= =?utf-8?q?=E2=80=99_event_using_GPFS_callback=3F?= In-Reply-To: References: <7F7F8D51-256D-4EA7-B03F-11CF7B752AFA@nuance.com><9892d4f382cb4fbe80dbde9f87724632@mbxtoa1.winmail.deshaw.com> Message-ID: Various "leave" / join events may be interesting ... But you've got to consider that an abrupt failure of several nodes is not necessarily recorded anywhere! For example, because the would be recording devices might all lose power at the same time. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu Jan 31 15:46:38 2019 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 31 Jan 2019 15:46:38 +0000 Subject: [gpfsug-discuss] =?utf-8?q?Node_=E2=80=98crash_and_restart?= =?utf-8?q?=E2=80=99_event_using_GPFS_callback=3F?= In-Reply-To: References: <7F7F8D51-256D-4EA7-B03F-11CF7B752AFA@nuance.com> <9892d4f382cb4fbe80dbde9f87724632@mbxtoa1.winmail.deshaw.com> Message-ID: <572FF01C-A82D-45FD-AB34-A897BFE59325@nuance.com> A better way to detect node expels is to install the expelnode into /var/mmfs/etc/ (sample in /usr/lpp/mmfs/samples/expelnode.sample) - put this on your manager nodes. It runs on every expel and you can customize it pretty easily. We generate a Slack message to a specific channel: GPFS Node Expel nrg1 APP [1:56 AM] nrg1-gpfs01 Expelling node gnj-r05r05u30, other node cnt-r04r08u40 Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of "Buterbaugh, Kevin L" Reply-To: gpfsug main discussion list Date: Thursday, January 31, 2019 at 9:19 AM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Node ?crash and restart? event using GPFS callback? Hi Bob, We use the nodeLeave callback to detect node expels ? for what you?re wanting to do I wonder if nodeJoin might work?? If a node joins the cluster and then has an uptime of a few minutes you could go looking in /tmp/mmfs. HTH... -- Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Jan 30, 2019, at 3:02 PM, Sanchez, Paul > wrote: There are some cases which I don?t believe can be caught with callbacks (e.g. DMS = Dead Man Switch). But you could possibly use preStartup to check the host uptime to make an assumption if GPFS was restarted long after the host booted. You could also peek in /tmp/mmfs and only report if you find something there. That said, the docs say that preStartup fires after the node joins the cluster. So if that means once the node is ?active? then you might miss out on nodes stuck in ?arbitrating? for a while due to a waiter problem. We run a script with cron which monitors the myriad things which can go wrong and attempt to right those which are safe to fix, and raise alerts appropriately. Something like that, outside the reach of GPFS, is often a good choice if you don?t need to know something the moment it happens. Thx Paul From: gpfsug-discuss-bounces at spectrumscale.org > On Behalf Of Oesterlin, Robert Sent: Wednesday, January 30, 2019 3:52 PM To: gpfsug main discussion list > Subject: [gpfsug-discuss] Node ?crash and restart? event using GPFS callback? Anyone crafted a good way to detect a node ?crash and restart? event using GPFS callbacks? I?m thinking ?preShutdown? but I?m not sure if that?s the best. What I?m really looking for is did the node shutdown (abort) and create a dump in /tmp/mmfs Bob Oesterlin Sr Principal Storage Engineer, Nuance _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cccd012a939124326a53908d686f64117%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636844789557921185&sdata=9bMPd%2F%2B%2Babt6IdeFYcdznPBQwPrMLFsXHTBYISlyYGM%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From chair at spectrumscale.org Thu Jan 31 20:44:25 2019 From: chair at spectrumscale.org (Simon Thompson (Spectrum Scale User Group Chair)) Date: Thu, 31 Jan 2019 20:44:25 +0000 Subject: [gpfsug-discuss] Call for input & save the date Message-ID: <213C4D17-C0D2-4883-834F-7E2E00B4EE3F@spectrumscale.org> Hi All, We?ve just published the main dates for 2019 Spectrum Scale meetings on the user group website at: https://www.spectrumscaleug.org/ Please take a look over the list of events and pencil them in your diary! (some of those later in the year are tentative and there are a couple more that might get added in some other territories). Myself, Kristy, Bob, Chris and Ulf are currently having some discussion on the topics we?d like to have covered in the various user group meetings. If you have any specific topics you?d like to hear about, then please let me know in the next few days? we can?t promise we can get a speaker, but if you don?t let us know we can?t try! As usual, we?ll be looking for user speakers for all of our events. The user group events only work well if we have people talking about their uses of Spectrum Scale, so please think about offering a talk and let us know! Thanks Simon UK Group Chair -------------- next part -------------- An HTML attachment was scrubbed... URL: