From r.sobey at imperial.ac.uk Tue Dec 1 09:22:43 2015 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Tue, 1 Dec 2015 09:22:43 +0000 Subject: [gpfsug-discuss] Introduction Message-ID: Good morning all, I'm Richard and work in the storage team at Imperial College London. We've had a GPFS/CTDB/Samba cluster for over a year where research group data can be stored. Other responsibilities include Cisco UCS, the storage, some IBM filers and other general datacentre related activities. I've been subscribed to this list for over a year but due to sever user error I haven't seen any emails from it until now... I did think it was a bit quiet! Looking forward to seeing more real world problems and fixes surrounding GPFS especially as we begin to plan our upgrade from 3.5 to 4.x (whichever is stable and available). Richard Richard Sobey Storage Area Network (SAN) Analyst Technical Operations, ICT Imperial College London South Kensington 403, City & Guilds Building London SW7 2AZ Tel: +44 (0)20 7594 6915 Email: r.sobey at imperial.ac.uk http://www.imperial.ac.uk/admin-services/ict/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Tue Dec 1 17:02:11 2015 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 1 Dec 2015 12:02:11 -0500 Subject: [gpfsug-discuss] IO performance of replicated GPFS filesystem In-Reply-To: <8b3278e23a5b42a3be80629ee18f307b@R01UKEXCASM223.r01.fujitsu.local> References: <8b3278e23a5b42a3be80629ee18f307b@R01UKEXCASM223.r01.fujitsu.local> Message-ID: <201512011702.tB1H2MuV027696@d03av02.boulder.ibm.com> Generally yes. When reading, more disks is always better then fewer disks, both for replication and with striping over several or many disks. When writing, more disks is good with striping. But yes, replication costs you extra writes. Those writes don't necessarily cost you loss of time, provided they can be done concurrently. When I write "disks" I mean storage devices that can be accessed concurrently. Watch out for virtual LUNs. With conventional controllers and drives, it does GPFS little or no good when multiple LUNs map to the same real disk device, since multiple operations to different LUNs will ultimately be serialized at one real disk arm/head! For high performance, you should not be thinking about "two NSDs" ... you should be thinking about many NSD, so data and metadata can be striped, and written and read concurrently. But yes, for replication purposes you have to consider defining and properly configuring at least two "failure groups". From: "Tomasz.Wolski at ts.fujitsu.com" To: "gpfsug-discuss at spectrumscale.org" Date: 11/30/2015 05:46 AM Subject: [gpfsug-discuss] IO performance of replicated GPFS filesystem Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, I could use some help of the experts here J Please correct me if I?m wrong: I suspect that GPFS filesystem READ performance is better when filesystem is replicated to i.e. two failure groups, where these failure groups are placed on separate RAID controllers. In this case WRITE performance should be worse, since the same data must go to two locations. What about situation where GPFS filesystem has two metadataOnly NSDs which are also replicated? Does metadata READ performance increase in this way as well (and WRITE decreases)? Best regards, Tomasz Wolski_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Paul.Sanchez at deshaw.com Tue Dec 1 17:29:48 2015 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Tue, 1 Dec 2015 17:29:48 +0000 Subject: [gpfsug-discuss] IO performance of replicated GPFS filesystem In-Reply-To: <201512011702.tB1H2MuV027696@d03av02.boulder.ibm.com> References: <8b3278e23a5b42a3be80629ee18f307b@R01UKEXCASM223.r01.fujitsu.local> <201512011702.tB1H2MuV027696@d03av02.boulder.ibm.com> Message-ID: <81116cc7ad184851a48794dc1b1c903d@mbxpsc3.winmail.deshaw.com> All of Marc?s points are good. A few more things to be aware of with regard to replicated writes: ? Each client performs its own replication when it writes file data. So if you have several clients, each writing files concurrently, the ?bandwidth burden? of the replication is distributed among them. It?s typical that your write throughput will be limited by disk in this case. ? Because clients perform their own write replication, the max write throughput of a NSD client is limited to <50% of its available network bandwidth for 2x replication, or <33% for 3x replication, since it must share the network interface (Ethernet, IB) to access the NSDs in each failure group. ? If your network topology is asymmetric (e.g. multiple dataceters with higher latency and limited bandwidth between them) you may also benefit from using ?readReplicaPolicy=fastest? to keep read traffic ?local? and avoid crossing congested or high-latency paths. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: Tuesday, December 01, 2015 12:02 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] IO performance of replicated GPFS filesystem Generally yes. When reading, more disks is always better then fewer disks, both for replication and with striping over several or many disks. When writing, more disks is good with striping. But yes, replication costs you extra writes. Those writes don't necessarily cost you loss of time, provided they can be done concurrently. When I write "disks" I mean storage devices that can be accessed concurrently. Watch out for virtual LUNs. With conventional controllers and drives, it does GPFS little or no good when multiple LUNs map to the same real disk device, since multiple operations to different LUNs will ultimately be serialized at one real disk arm/head! For high performance, you should not be thinking about "two NSDs" ... you should be thinking about many NSD, so data and metadata can be striped, and written and read concurrently. But yes, for replication purposes you have to consider defining and properly configuring at least two "failure groups". From: "Tomasz.Wolski at ts.fujitsu.com" > To: "gpfsug-discuss at spectrumscale.org" > Date: 11/30/2015 05:46 AM Subject: [gpfsug-discuss] IO performance of replicated GPFS filesystem Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi All, I could use some help of the experts here ?Please correct me if I?m wrong: I suspect that GPFS filesystem READ performance is better when filesystem is replicated to i.e. two failure groups, where these failure groups are placed on separate RAID controllers. In this case WRITE performance should be worse, since the same data must go to two locations. What about situation where GPFS filesystem has two metadataOnly NSDs which are also replicated? Does metadata READ performance increase in this way as well (and WRITE decreases)? Best regards, Tomasz Wolski_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Greg.Lehmann at csiro.au Wed Dec 2 02:00:08 2015 From: Greg.Lehmann at csiro.au (Greg.Lehmann at csiro.au) Date: Wed, 2 Dec 2015 02:00:08 +0000 Subject: [gpfsug-discuss] Introduction In-Reply-To: References: Message-ID: Hi Richard, It was evening when I got your message, so keep in mind it is a global list. I'm based in Australia and joined the list yesterday after attending the SC15 User Group meeting. Your message is the first I've had arrive in my inbox. We (organisation details in my signature below) have a GPFS scratch filesystem for our HPC clusters that has been running for a year and is only now being eased into production. It's running 4.1.0.6. We also have a 4.1.1.2 POC system running on a VM with an LTSF backend. There are plans to use GPFS to provide a highly available NFS service as well. Also looking forward to seeing how others are using Spectrum Scale in anger. Cheers, Greg Lehmann Senior High Performance Data Specialist Data Services | Scientific Computing Platforms CSIRO Information Management and Technology Phone: +61 7 3327 4137 | Fax: +61 1 3327 4455 Greg.Lehmann at csiro.au | www.csiro.au Address: 1 Technology Court, Pullenvale, QLD 4069 PLEASE NOTE The information contained in this email may be confidential or privileged. Any unauthorised use or disclosure is prohibited. If you have received this email in error, please delete it immediately and notify the sender by return email. Thank you. To the extent permitted by law, CSIRO does not represent, warrant and/or guarantee that the integrity of this communication has been maintained or that the communication is free of errors, virus, interception or interference. Please consider the environment before printing this email. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: Tuesday, 1 December 2015 7:23 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Introduction Good morning all, I'm Richard and work in the storage team at Imperial College London. We've had a GPFS/CTDB/Samba cluster for over a year where research group data can be stored. Other responsibilities include Cisco UCS, the storage, some IBM filers and other general datacentre related activities. I've been subscribed to this list for over a year but due to sever user error I haven't seen any emails from it until now... I did think it was a bit quiet! Looking forward to seeing more real world problems and fixes surrounding GPFS especially as we begin to plan our upgrade from 3.5 to 4.x (whichever is stable and available). Richard Richard Sobey Storage Area Network (SAN) Analyst Technical Operations, ICT Imperial College London South Kensington 403, City & Guilds Building London SW7 2AZ Tel: +44 (0)20 7594 6915 Email: r.sobey at imperial.ac.uk http://www.imperial.ac.uk/admin-services/ict/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From kraemerf at de.ibm.com Wed Dec 2 18:14:49 2015 From: kraemerf at de.ibm.com (Frank Kraemer) Date: Wed, 2 Dec 2015 19:14:49 +0100 Subject: [gpfsug-discuss] Spectrum Scale next generation hadoop connector is available for public download Message-ID: <201512021815.tB2IFu4b015005@d06av09.portsmouth.uk.ibm.com> Spectrum Scale next generation hadoop connector is available for public download from YONG ZHENG, Spectrum Scale HDFS transparency (or HDFS protocol), the next generation Hadoop connector, is available for public download from IBM developerWorks: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20%28GPFS%29/page/Hadoop%20Connector%20Download%20%26%20Info The 1st release 2.7.0-0 supports FPO storage model (internal disk model). For shared storage model (e.g. ESS, SAN-based storage), it's in progress and will be available around Jan, 2016. The key advantage for HDFS transparency includes: 1.GPFS client free(it doesn't need every Hadoop node to install GPFS client) 2.Full kerberos support for enterprise security requirement 3.make some hdfs hard-coded components working(e.g. Impala which will call HDFS client directly instead calling Hadoop FileSystem interface) 4.make some popular features working: discp, webhdfs, multiple hadoop cluster over the same Spectrum Scale file systems etc refer the above link for more details. -frank- Frank Kraemer IBM Consulting IT Specialist / Client Technical Architect Hechtsheimer Str. 2, 55131 Mainz mailto:kraemerf at de.ibm.com voice: +49171-3043699 IBM Germany -------------- next part -------------- An HTML attachment was scrubbed... URL: From Paul.Sanchez at deshaw.com Wed Dec 2 22:22:05 2015 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Wed, 2 Dec 2015 22:22:05 +0000 Subject: [gpfsug-discuss] Anyone else using Veritas NetBackup with GPFS? Message-ID: <147828caa4a041f991cf985978b5ef3c@mbxpsc3.winmail.deshaw.com> We have a relatively mature NetBackup environment which handles all of our tape backup requirements (e.g. databases, NetApp via NDMP, Windows shares, and native Linux - which provides rudimentary coverage of GPFS). Because of this existing base, we're hesitant to invest in a completely separate infrastructure just for the benefit of GPFS backups. While this works well for many filesets, it's not ideal for very large ones. We've been trying to get Veritas to support a more parallel file-scan and file-copy approach to backing up GPFS filesystems, and have repeatedly hit a brick wall there over the past year. But I have a recent thread in which they note that there is a fairly large and vocal GPFS customer who drives significant feature flow in their product. Any chance that customer is a member of the User Group? If so, I'd love to hear from you. Thanks, Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu Dec 3 14:46:25 2015 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 3 Dec 2015 14:46:25 +0000 Subject: [gpfsug-discuss] 4.2 - Performance Collector node - Scaling Message-ID: <3E1BCECB-9B7C-4C87-85A4-5326CB175C69@nuance.com> In the 4.2 documentation under ?Manually Installing the Performance Monitoring Tool? there is the statement: "A single collector can easily support at least 400 sensor nodes.? During the user group meeting, there were discussions on scalability. Is is safe to assume that a cluster of 350 nodes can be configured to use a single collector node? Bob Oesterlin Sr Storage Engineer, Nuance Communications -------------- next part -------------- An HTML attachment was scrubbed... URL: From GARWOODM at uk.ibm.com Thu Dec 3 14:57:31 2015 From: GARWOODM at uk.ibm.com (Michael Garwood7) Date: Thu, 3 Dec 2015 14:57:31 +0000 Subject: [gpfsug-discuss] 4.2 - Performance Collector node - Scaling In-Reply-To: <3E1BCECB-9B7C-4C87-85A4-5326CB175C69@nuance.com> References: <3E1BCECB-9B7C-4C87-85A4-5326CB175C69@nuance.com> Message-ID: <201512031458.tB3EwTDE010953@d06av05.portsmouth.uk.ibm.com> An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Dec 3 15:08:05 2015 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 3 Dec 2015 10:08:05 -0500 Subject: [gpfsug-discuss] Anyone else using Veritas NetBackup with GPFS? In-Reply-To: <147828caa4a041f991cf985978b5ef3c@mbxpsc3.winmail.deshaw.com> References: <147828caa4a041f991cf985978b5ef3c@mbxpsc3.winmail.deshaw.com> Message-ID: <201512031508.tB3F8RdQ010223@d03av01.boulder.ibm.com> Perhaps you can use mmapplypolicy which has a parallel file scan and parallel execute whatever script you like against the files found capabilities to "drive" your backups. That is what the IBM supported mmbackup does. mmbackup is supported for use with IBM/Tivoli TSM. Of course IBM would like you to buy TSM, but you are free to "mashup" mmapplypolicy with any other software you choose. Also we recently shipped an improved version of samples/ilm/mmfind that makes it easy to exploit parallel find and execute without sweating the details of mmapplypolicy and its peculiar policy SQL/rules. -- marc of GPFS -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu Dec 3 15:18:03 2015 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 3 Dec 2015 15:18:03 +0000 Subject: [gpfsug-discuss] 4.2 - Performance Collector node - Scaling In-Reply-To: <201512031458.tB3EwTDE010953@d06av05.portsmouth.uk.ibm.com> References: <3E1BCECB-9B7C-4C87-85A4-5326CB175C69@nuance.com> <201512031458.tB3EwTDE010953@d06av05.portsmouth.uk.ibm.com> Message-ID: Hi Mike ? Thanks, The documentation says ?at least 400? but doesn?t define an acceptable ratio. If you mean ?don?t do more than 400? then it should state that. Bob Oesterlin Sr Storage Engineer, Nuance Communications From: > on behalf of Michael Garwood7 > Reply-To: gpfsug main discussion list > Date: Thursday, December 3, 2015 at 8:57 AM To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] 4.2 - Performance Collector node - Scaling Hi Bob, Yes, 350 nodes should be fine since it is under the 400 acceptable limit. Generally the only concern with a large number of sensors is the volume of data you may need to sift through. Regards, Michael Spectrum Scale Developer -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweil at genome.wustl.edu Thu Dec 3 15:34:43 2015 From: mweil at genome.wustl.edu (Matt Weil) Date: Thu, 3 Dec 2015 09:34:43 -0600 Subject: [gpfsug-discuss] Anyone else using Veritas NetBackup with GPFS? In-Reply-To: <147828caa4a041f991cf985978b5ef3c@mbxpsc3.winmail.deshaw.com> References: <147828caa4a041f991cf985978b5ef3c@mbxpsc3.winmail.deshaw.com> Message-ID: <56606113.1080206@genome.wustl.edu> Paul, We currently run netbackup to push about 1.3PB of real data to tape. This using 1 nb master and a single media server that is also a GPFS client. The media server uses the spare file system space as a staging area before writing to tape. We have recently invested into a TSM server due to limitations of netbackup. The PVU licensing model makes TSM cost effective. We simply are not able to speed up the netbackup catalog even with SSD. You could potentially use the gpfs ilm engine to create file lists to feed to netbackup. netbackup (now back to veritas) does not officially support GPFS. Netbackup is not aware of gpfs metadata. Matt On 12/2/15 4:22 PM, Sanchez, Paul wrote: > We have a relatively mature NetBackup environment which handles all of > our tape backup requirements (e.g. databases, NetApp via NDMP, Windows > shares, and native Linux ? which provides rudimentary coverage of > GPFS). Because of this existing base, we?re hesitant to invest in a > completely separate infrastructure just for the benefit of GPFS > backups. While this works well for many filesets, it?s not ideal for > very large ones. > We?ve been trying to get Veritas to support a more parallel file-scan > and file-copy approach to backing up GPFS filesystems, and have > repeatedly hit a brick wall there over the past year. But I have a > recent thread in which they note that there is a fairly large and > vocal GPFS customer who drives significant feature flow in their > product. Any chance that customer is a member of the User Group? If > so, I?d love to hear from you. > Thanks, > Paul > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ____ This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Thu Dec 3 15:41:40 2015 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 3 Dec 2015 11:41:40 -0400 Subject: [gpfsug-discuss] Anyone else using Veritas NetBackup with GPFS? In-Reply-To: <56606113.1080206@genome.wustl.edu> References: <147828caa4a041f991cf985978b5ef3c@mbxpsc3.winmail.deshaw.com> <56606113.1080206@genome.wustl.edu> Message-ID: Matt, this was true for a while, but got fixed, Netbackup has added support for GPFS metadata and acls in newer versions. more details can be read here : https://www.veritas.com/support/en_US/article.000079433 sven On Thu, Dec 3, 2015 at 11:34 AM, Matt Weil wrote: > Paul, > > We currently run netbackup to push about 1.3PB of real data to tape. This > using 1 nb master and a single media server that is also a GPFS client. > The media server uses the spare file system space as a staging area before > writing to tape. We have recently invested into a TSM server due to > limitations of netbackup. The PVU licensing model makes TSM cost > effective. We simply are not able to speed up the netbackup catalog even > with SSD. You could potentially use the gpfs ilm engine to create file > lists to feed to netbackup. > > netbackup (now back to veritas) does not officially support GPFS. > Netbackup is not aware of gpfs metadata. > > Matt > > > On 12/2/15 4:22 PM, Sanchez, Paul wrote: > > We have a relatively mature NetBackup environment which handles all of our > tape backup requirements (e.g. databases, NetApp via NDMP, Windows shares, > and native Linux ? which provides rudimentary coverage of GPFS). Because > of this existing base, we?re hesitant to invest in a completely separate > infrastructure just for the benefit of GPFS backups. While this works well > for many filesets, it?s not ideal for very large ones. > > We?ve been trying to get Veritas to support a more parallel file-scan and > file-copy approach to backing up GPFS filesystems, and have repeatedly hit > a brick wall there over the past year. But I have a recent thread in which > they note that there is a fairly large and vocal GPFS customer who drives > significant feature flow in their product. Any chance that customer is a > member of the User Group? If so, I?d love to hear from you. > > Thanks, > Paul > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > ____ This email message is a private communication. The information > transmitted, including attachments, is intended only for the person or > entity to which it is addressed and may contain confidential, privileged, > and/or proprietary material. Any review, duplication, retransmission, > distribution, or other use of, or taking of any action in reliance upon, > this information by persons or entities other than the intended recipient > is unauthorized by the sender and is prohibited. If you have received this > message in error, please contact the sender immediately by return email and > delete the original message from all computer systems. Thank you. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Paul.Sanchez at deshaw.com Thu Dec 3 16:12:25 2015 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Thu, 3 Dec 2015 16:12:25 +0000 Subject: [gpfsug-discuss] Anyone else using Veritas NetBackup with GPFS? In-Reply-To: <201512031508.tB3F8RdQ010223@d03av01.boulder.ibm.com> References: <147828caa4a041f991cf985978b5ef3c@mbxpsc3.winmail.deshaw.com> <201512031508.tB3F8RdQ010223@d03av01.boulder.ibm.com> Message-ID: Yes, we did something very similar: creating file shard lists and feeding those to the bpbackup CLI tool to schedule. In theory, this is also a node-scalable approach when using a synthetic client name shared by many nodes (all enumerated by a round-robin DNS A record) . But there are some serious limitations (e.g. no way to avoid implicit directory recursion without failing to capture directory permissions) that make this less than ideal. That's a NetBackup issue, of course, not a SpectrumScale limitation. But unfortunately it isn't something Veritas shows any interest in fixing. -Paul From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan Sent: Thursday, December 03, 2015 10:08 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Anyone else using Veritas NetBackup with GPFS? Perhaps you can use mmapplypolicy which has a parallel file scan and parallel execute whatever script you like against the files found capabilities to "drive" your backups. That is what the IBM supported mmbackup does. mmbackup is supported for use with IBM/Tivoli TSM. Of course IBM would like you to buy TSM, but you are free to "mashup" mmapplypolicy with any other software you choose. Also we recently shipped an improved version of samples/ilm/mmfind that makes it easy to exploit parallel find and execute without sweating the details of mmapplypolicy and its peculiar policy SQL/rules. -- marc of GPFS -------------- next part -------------- An HTML attachment was scrubbed... URL: From seanlee at tw.ibm.com Thu Dec 3 16:36:54 2015 From: seanlee at tw.ibm.com (Sean S Lee) Date: Thu, 3 Dec 2015 16:36:54 +0000 Subject: [gpfsug-discuss] Anyone else using Veritas NetBackup with GPFS? Message-ID: <201512031638.tB3Gc2gj021390@d23av02.au.ibm.com> An HTML attachment was scrubbed... URL: From erich at uw.edu Thu Dec 3 17:46:03 2015 From: erich at uw.edu (Eric Horst) Date: Thu, 3 Dec 2015 09:46:03 -0800 Subject: [gpfsug-discuss] Anyone else using Veritas NetBackup with GPFS? In-Reply-To: <201512031508.tB3F8RdQ010223@d03av01.boulder.ibm.com> References: <147828caa4a041f991cf985978b5ef3c@mbxpsc3.winmail.deshaw.com> <201512031508.tB3F8RdQ010223@d03av01.boulder.ibm.com> Message-ID: marc, would you or somebody be willing to share a copy of samples/ilm/mmfind to a lowly gpfs 3.5 user? I assume as a sample it might be shareable. I was just about to put some effort into improving some local code we've been running for a long time that was based on gpfs 3.2 samples and is not parallel. Thanks -Eric On Thu, Dec 3, 2015 at 7:08 AM, Marc A Kaplan wrote: > Perhaps you can use mmapplypolicy which has a parallel file scan and > parallel execute whatever script you like against the files found > capabilities to "drive" your backups. > That is what the IBM supported mmbackup does. mmbackup is supported for > use with IBM/Tivoli TSM. Of course IBM would like you to buy TSM, but you > are free to "mashup" mmapplypolicy with any other software you choose. > > Also we recently shipped an improved version of samples/ilm/mmfind that > makes it easy to exploit parallel find and execute without sweating the > details of mmapplypolicy and its peculiar policy SQL/rules. > > -- marc of GPFS > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Douglas.Hughes at DEShawResearch.com Thu Dec 3 17:52:46 2015 From: Douglas.Hughes at DEShawResearch.com (Hughes, Doug) Date: Thu, 3 Dec 2015 17:52:46 +0000 Subject: [gpfsug-discuss] Anyone else using Veritas NetBackup with GPFS? In-Reply-To: References: <147828caa4a041f991cf985978b5ef3c@mbxpsc3.winmail.deshaw.com> <201512031508.tB3F8RdQ010223@d03av01.boulder.ibm.com> Message-ID: I can also, for that matter. This is drifting from Netbackup, but we use an hourly mmapply policy that calls a LIST mechanism with a callback that invokes a script that just takes all of the files output and pulls the file names out of them for rsync to our TSM server. The hourly searches the most recent filesets, and there is a daily that catches anything that might have been missed. The callback script is pretty simple, it just parses out the file name field from the index output results that come from the policy execution and sorts them and renames them to new files, then a multi-threaded worker backup process (we call it backupd) scans the directory for more things to do and invokes rsync ?files-from to get the file list. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Eric Horst Sent: Thursday, December 03, 2015 12:46 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Anyone else using Veritas NetBackup with GPFS? marc, would you or somebody be willing to share a copy of samples/ilm/mmfind to a lowly gpfs 3.5 user? I assume as a sample it might be shareable. I was just about to put some effort into improving some local code we've been running for a long time that was based on gpfs 3.2 samples and is not parallel. Thanks -Eric On Thu, Dec 3, 2015 at 7:08 AM, Marc A Kaplan > wrote: Perhaps you can use mmapplypolicy which has a parallel file scan and parallel execute whatever script you like against the files found capabilities to "drive" your backups. That is what the IBM supported mmbackup does. mmbackup is supported for use with IBM/Tivoli TSM. Of course IBM would like you to buy TSM, but you are free to "mashup" mmapplypolicy with any other software you choose. Also we recently shipped an improved version of samples/ilm/mmfind that makes it easy to exploit parallel find and execute without sweating the details of mmapplypolicy and its peculiar policy SQL/rules. -- marc of GPFS _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Dec 3 19:01:48 2015 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 3 Dec 2015 14:01:48 -0500 Subject: [gpfsug-discuss] samples/ilm/mmfind In-Reply-To: References: <147828caa4a041f991cf985978b5ef3c@mbxpsc3.winmail.deshaw.com><201512031508.tB3F8RdQ010223@d03av01.boulder.ibm.com> Message-ID: <201512031901.tB3J1tjM023997@d03av03.boulder.ibm.com> Complete, correct operation of mmfind relies on some functional updates in Spectrum Scale that were not available until 4.1.1. Since "it's only a sample" we will accept bug reports on an informal basis for mmfind operating in the 4.1.1 (or later) environment. (If you happened to get a copy of the newer mmfind code and attempted to retroport it to run with an older version of GPFS, you would get what you get, results would depend which flags and/or features you attempted to exercise - do so with even more risk than running other /samples/ code, and do not plan on getting any support.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From jez.tucker at gpfsug.org Thu Dec 3 20:06:00 2015 From: jez.tucker at gpfsug.org (Jez Tucker) Date: Thu, 3 Dec 2015 20:06:00 +0000 Subject: [gpfsug-discuss] Anyone else using Veritas NetBackup with GPFS? In-Reply-To: References: <147828caa4a041f991cf985978b5ef3c@mbxpsc3.winmail.deshaw.com> <201512031508.tB3F8RdQ010223@d03av01.boulder.ibm.com> Message-ID: <5660A0A8.400@gpfsug.org> Hi all If anyone has anything that they would like to share out, I suggest putting it in the UG github repo at: https://github.com/gpfsug/gpfsug-tools If you can't use git, by all means send it directly to me at: jez.tucker at gpfsug.org and I'll add it in for you. All the best, Jez On 03/12/15 17:46, Eric Horst wrote: > marc, would you or somebody be willing to share a copy of > samples/ilm/mmfind to a lowly gpfs 3.5 user? I assume as a sample it > might be shareable. I was just about to put some effort into improving > some local code we've been running for a long time that was based on > gpfs 3.2 samples and is not parallel. > > Thanks > > -Eric > > > > On Thu, Dec 3, 2015 at 7:08 AM, Marc A Kaplan > wrote: > > Perhaps you can use mmapplypolicy which has a parallel file scan > and parallel execute whatever script you like against the files > found capabilities to "drive" your backups. > That is what the IBM supported mmbackup does. mmbackup is > supported for use with IBM/Tivoli TSM. Of course IBM would like > you to buy TSM, but you are free to "mashup" mmapplypolicy with > any other software you choose. > > Also we recently shipped an improved version of samples/ilm/mmfind > that makes it easy to exploit parallel find and execute without > sweating the details of mmapplypolicy and its peculiar policy > SQL/rules. > > -- marc of GPFS > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From sjhoward at iu.edu Fri Dec 4 03:45:12 2015 From: sjhoward at iu.edu (Howard, Stewart Jameson) Date: Fri, 4 Dec 2015 03:45:12 +0000 Subject: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting Message-ID: <1449200712160.50526@iu.edu> Hi All, At our site, we have very recently (as of ~48 hours ago) configured one of our supercomputers (an x86 cluster containing about 315 nodes) to be a GPFS client cluster and to access our core GPFS cluster using a remote mount, per the instuctions in the GFPS Advanced Administration Guide. In addition to allowing remote access from this newly-configured client cluster, we also export the filesystem via NFSv3 to two other supercomputers in our data center. We do not use the GPFS CNFS solution to provide NFS mounts. Instead, we use CTDB to manage NFS on the four core-cluster client nodes that re-export the filesystem. The exports of NFSv3 managed by CTDB pre-date the client GPFS cluster deployment. Since deploying GPFS clients onto the one supercomputer, we have been experiencing a great deal of flapping in our CTDB layer. It's difficult to sort out what is causing what, but I can identify a handful of the symptoms that we're seeing: 1) In the CTDB logs of all the NFS server nodes, we see numerous complaints (on some nodes this is multiple times a day) that rpc.mountd is not running and is being restarted, i.e., "ERROR: MOUNTD is not running. Trying to restart it." 2) In syslog, rpc.mountd can be seen complaining that it is unable to bind to a socket and that an address is already in use, i.e., "rpc.mountd[16869]: Could not bind socket: (98) Address already in use" The rpc.mountd daemon on these nodes is manually constrained to use port 597. The mountd daemon seems able to listen for UDP connections on this port, but not for TCP connections. However, investigating `lsof` and `netstat` reveals no process that is using port 597 and preventing rpc.mountd from using it. 3) We also see nfsd failing its CTDB health check several times a day, i.e., "Event script timed out : 60.nfs monitor count : 0 pid : 7172" Both the non-running state of rpc.mountd and the failure of nfsd to pass its CTDB health checks are causing multiple nodes in the NFS export cluster to become "UNHEALTHY" (the CTDB designation for it) multiple times a day, resulting in a lot of flapping and passing IP addresses back and forth. I should mention here that nfsd on these nodes was running without any problems for the last month up until the night when we deployed the GPFS client cluster. After that deployment, the host of problems listed above suddenly started up. I should also mention that the new client GPFS cluster is running quite nicely, although it is generating a lot more open network sockets on the core-cluster side. We believe that the NFS problems starting at the same time as the GPFS client deployment is not a coincidence, and are inclined to conclude that something about deploying GPFS clients on the supercomputer in question is destabilizing the NFS instances running on the clients that belong to the core cluster. Our current hypothesis is that introducing all of these new GPFS clients has caused contention for some resource on the core-cluster client nodes (ports?, open file handles?, something else?) and GPFS is winning out over NFS. Does anyone have experience with running NFS and GPFS together in such an environment, especially with CTDB as a high-availability daemon? Has anyone perhaps seen these kinds of problems before or have any ideas as to what may be causing them? We're happy to provide any additional diagnostics that the group would like to see in order to investigate. As always, we very much appreciate any help that you are able to provide. Thank you so much! Stewart Howard Indiana University -------------- next part -------------- An HTML attachment was scrubbed... URL: From Douglas.Hughes at DEShawResearch.com Fri Dec 4 13:00:41 2015 From: Douglas.Hughes at DEShawResearch.com (Hughes, Doug) Date: Fri, 4 Dec 2015 13:00:41 +0000 Subject: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting In-Reply-To: <1449200712160.50526@iu.edu> References: <1449200712160.50526@iu.edu> Message-ID: <213126e43615407095bff80214d58fbf@mbxtoa3.winmail.deshaw.com> One thing that we discovered very early on using CTDB (or CNFS for that matter) with GPFS is the importance of having the locking/sharing part of ctdb *not* be on the same filesystem that it is exporting. If they are the same, then as soon as the back-end main filesystem gets heavily loaded, ctdb will start timing out tickles and then you'll have all kinds of intermittent and inconvenient failures, often with manual recovery needed afterwards. We took some of the flash that we use for metadata and created a special cluster filesystem on that that has the ctdb locking database on it. Now, if the back-end main filesystem gets slow, it's just slow for all clients, instead of slow for GPFS clients and unavailable for NFS clients because all of the ctdb checks have failed. Sent from my android device. -----Original Message----- From: "Howard, Stewart Jameson" To: "gpfsug-discuss at spectrumscale.org" Cc: "Garrison, E Chris" Sent: Thu, 03 Dec 2015 22:45 Subject: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting Hi All, At our site, we have very recently (as of ~48 hours ago) configured one of our supercomputers (an x86 cluster containing about 315 nodes) to be a GPFS client cluster and to access our core GPFS cluster using a remote mount, per the instuctions in the GFPS Advanced Administration Guide. In addition to allowing remote access from this newly-configured client cluster, we also export the filesystem via NFSv3 to two other supercomputers in our data center. We do not use the GPFS CNFS solution to provide NFS mounts. Instead, we use CTDB to manage NFS on the four core-cluster client nodes that re-export the filesystem. The exports of NFSv3 managed by CTDB pre-date the client GPFS cluster deployment. Since deploying GPFS clients onto the one supercomputer, we have been experiencing a great deal of flapping in our CTDB layer. It's difficult to sort out what is causing what, but I can identify a handful of the symptoms that we're seeing: 1) In the CTDB logs of all the NFS server nodes, we see numerous complaints (on some nodes this is multiple times a day) that rpc.mountd is not running and is being restarted, i.e., ?ERROR: MOUNTD is not running. Trying to restart it.? 2) In syslog, rpc.mountd can be seen complaining that it is unable to bind to a socket and that an address is already in use, i.e., ?rpc.mountd[16869]: Could not bind socket: (98) Address already in use? The rpc.mountd daemon on these nodes is manually constrained to use port 597. The mountd daemon seems able to listen for UDP connections on this port, but not for TCP connections. However, investigating `lsof` and `netstat` reveals no process that is using port 597 and preventing rpc.mountd from using it. 3) We also see nfsd failing its CTDB health check several times a day, i.e., ?Event script timed out : 60.nfs monitor count : 0 pid : 7172? Both the non-running state of rpc.mountd and the failure of nfsd to pass its CTDB health checks are causing multiple nodes in the NFS export cluster to become ?UNHEALTHY? (the CTDB designation for it) multiple times a day, resulting in a lot of flapping and passing IP addresses back and forth. I should mention here that nfsd on these nodes was running without any problems for the last month up until the night when we deployed the GPFS client cluster. After that deployment, the host of problems listed above suddenly started up. I should also mention that the new client GPFS cluster is running quite nicely, although it is generating a lot more open network sockets on the core-cluster side. We believe that the NFS problems starting at the same time as the GPFS client deployment is not a coincidence, and are inclined to conclude that something about deploying GPFS clients on the supercomputer in question is destabilizing the NFS instances running on the clients that belong to the core cluster. Our current hypothesis is that introducing all of these new GPFS clients has caused contention for some resource on the core-cluster client nodes (ports?, open file handles?, something else?) and GPFS is winning out over NFS. Does anyone have experience with running NFS and GPFS together in such an environment, especially with CTDB as a high-availability daemon? Has anyone perhaps seen these kinds of problems before or have any ideas as to what may be causing them? We're happy to provide any additional diagnostics that the group would like to see in order to investigate. As always, we very much appreciate any help that you are able to provide. Thank you so much! Stewart Howard Indiana University -------------- next part -------------- An HTML attachment was scrubbed... URL: From A.Wolf-Reber at de.ibm.com Fri Dec 4 14:49:09 2015 From: A.Wolf-Reber at de.ibm.com (Alexander Wolf) Date: Fri, 4 Dec 2015 15:49:09 +0100 Subject: [gpfsug-discuss] 4.2 - Performance Collector node - Scaling In-Reply-To: References: <3E1BCECB-9B7C-4C87-85A4-5326CB175C69@nuance.com> <201512031458.tB3EwTDE010953@d06av05.portsmouth.uk.ibm.com> Message-ID: Hi Bob, providing crisp numbers here is a bit difficult. First it depends how powerful (CPU, memory) the machine is where the collector runs on. But even more it depends on the sampling frequency for the metrics that you have configured in your sensor configuration. If you collect every 100s instead of every second you get 1/100th of the data and will scale to much more nodes. Therefore those numbers are more like guidelines and the real limits depend on you individual configuration. Mit freundlichen Gr??en / Kind regards IBM Spectrum Scale Dr. Alexander Wolf-Reber Spectrum Scale GUI development lead Department M069 / Spectrum Scale Software Development +49-6131-84-6521 a.wolf-reber at de.ibm.com IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats: Martina Koederitz / Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 03/12/2015 16:18 Subject: Re: [gpfsug-discuss] 4.2 - Performance Collector node - Scaling Hi Mike ? Thanks, The documentation says ?at least 400? but doesn?t define an acceptable ratio. If you mean ?don?t do more than 400? then it should state that. Bob Oesterlin Sr Storage Engineer, Nuance Communications From: on behalf of Michael Garwood7 Reply-To: gpfsug main discussion list Date: Thursday, December 3, 2015 at 8:57 AM To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] 4.2 - Performance Collector node - Scaling Hi Bob, Yes, 350 nodes should be fine since it is under the 400 acceptable limit. Generally the only concern with a large number of sensors is the volume of data you may need to sift through. Regards, Michael Spectrum Scale Developer _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Kevin.Buterbaugh at Vanderbilt.Edu Fri Dec 4 14:58:46 2015 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Fri, 4 Dec 2015 14:58:46 +0000 Subject: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting In-Reply-To: <213126e43615407095bff80214d58fbf@mbxtoa3.winmail.deshaw.com> References: <1449200712160.50526@iu.edu> <213126e43615407095bff80214d58fbf@mbxtoa3.winmail.deshaw.com> Message-ID: <13DCE6CE-75AD-4C3B-A0B3-9ED224649B5D@vanderbilt.edu> Hi Stewart, We use the GPFS CNFS solution for NFS mounts and Sernet-Samba and CTDB for SAMBA mounts and that works well for us overall (we?ve been using this solution for over 2 years at this point). I guess I would ask why you chose to use CTDB instead of CNFS for NFS mounts?? I?ll also add that we are eagerly looking forward to doing some upgrades so that we can potentially use the GPFS Cluster Export Services mechanism going forward? Kevin On Dec 4, 2015, at 7:00 AM, Hughes, Doug > wrote: One thing that we discovered very early on using CTDB (or CNFS for that matter) with GPFS is the importance of having the locking/sharing part of ctdb *not* be on the same filesystem that it is exporting. If they are the same, then as soon as the back-end main filesystem gets heavily loaded, ctdb will start timing out tickles and then you'll have all kinds of intermittent and inconvenient failures, often with manual recovery needed afterwards. We took some of the flash that we use for metadata and created a special cluster filesystem on that that has the ctdb locking database on it. Now, if the back-end main filesystem gets slow, it's just slow for all clients, instead of slow for GPFS clients and unavailable for NFS clients because all of the ctdb checks have failed. Sent from my android device. -----Original Message----- From: "Howard, Stewart Jameson" > To: "gpfsug-discuss at spectrumscale.org" > Cc: "Garrison, E Chris" > Sent: Thu, 03 Dec 2015 22:45 Subject: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting Hi All, At our site, we have very recently (as of ~48 hours ago) configured one of our supercomputers (an x86 cluster containing about 315 nodes) to be a GPFS client cluster and to access our core GPFS cluster using a remote mount, per the instuctions in the GFPS Advanced Administration Guide. In addition to allowing remote access from this newly-configured client cluster, we also export the filesystem via NFSv3 to two other supercomputers in our data center. We do not use the GPFS CNFS solution to provide NFS mounts. Instead, we use CTDB to manage NFS on the four core-cluster client nodes that re-export the filesystem. The exports of NFSv3 managed by CTDB pre-date the client GPFS cluster deployment. Since deploying GPFS clients onto the one supercomputer, we have been experiencing a great deal of flapping in our CTDB layer. It's difficult to sort out what is causing what, but I can identify a handful of the symptoms that we're seeing: 1) In the CTDB logs of all the NFS server nodes, we see numerous complaints (on some nodes this is multiple times a day) that rpc.mountd is not running and is being restarted, i.e., ?ERROR: MOUNTD is not running. Trying to restart it.? 2) In syslog, rpc.mountd can be seen complaining that it is unable to bind to a socket and that an address is already in use, i.e., ?rpc.mountd[16869]: Could not bind socket: (98) Address already in use? The rpc.mountd daemon on these nodes is manually constrained to use port 597. The mountd daemon seems able to listen for UDP connections on this port, but not for TCP connections. However, investigating `lsof` and `netstat` reveals no process that is using port 597 and preventing rpc.mountd from using it. 3) We also see nfsd failing its CTDB health check several times a day, i.e., ?Event script timed out : 60.nfs monitor count : 0 pid : 7172? Both the non-running state of rpc.mountd and the failure of nfsd to pass its CTDB health checks are causing multiple nodes in the NFS export cluster to become ?UNHEALTHY? (the CTDB designation for it) multiple times a day, resulting in a lot of flapping and passing IP addresses back and forth. I should mention here that nfsd on these nodes was running without any problems for the last month up until the night when we deployed the GPFS client cluster. After that deployment, the host of problems listed above suddenly started up. I should also mention that the new client GPFS cluster is running quite nicely, although it is generating a lot more open network sockets on the core-cluster side. We believe that the NFS problems starting at the same time as the GPFS client deployment is not a coincidence, and are inclined to conclude that something about deploying GPFS clients on the supercomputer in question is destabilizing the NFS instances running on the clients that belong to the core cluster. Our current hypothesis is that introducing all of these new GPFS clients has caused contention for some resource on the core-cluster client nodes (ports?, open file handles?, something else?) and GPFS is winning out over NFS. Does anyone have experience with running NFS and GPFS together in such an environment, especially with CTDB as a high-availability daemon? Has anyone perhaps seen these kinds of problems before or have any ideas as to what may be causing them? We're happy to provide any additional diagnostics that the group would like to see in order to investigate. As always, we very much appreciate any help that you are able to provide. Thank you so much! Stewart Howard Indiana University _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From chair at spectrumscale.org Mon Dec 7 10:33:16 2015 From: chair at spectrumscale.org (GPFS UG Chair (Simon Thompson)) Date: Mon, 07 Dec 2015 10:33:16 +0000 Subject: [gpfsug-discuss] CIUK User Group Meeting Message-ID: I had a question about when the UG meeting is at Comppuing Insight this week it. Its tomorrow morning (8th), at 10am. Just a reminder that you need to register for CIUK if you are coming along: http://www.stfc.ac.uk/news-events-and-publications/events/computing-insight -uk-2015/ Agenda for the meeting is: * IBM - Introducing Spectrum Scale 4.2, GUI, QoS, 4.2.1 onwards * Vic Cornell (DDN) - How to ruin a perfectly good GPFS file system * Marc Roskow (Seagate) There is also time for discussion etc. See you tomorrow if you are coming along! Simon From sjhoward at iu.edu Mon Dec 7 17:23:34 2015 From: sjhoward at iu.edu (Howard, Stewart Jameson) Date: Mon, 7 Dec 2015 17:23:34 +0000 Subject: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting Message-ID: <1449509014517.19529@iu.edu> Hi All, Thanks to Doug and Kevin for the replies. In answer to Kevin's question about our choice of clustering solution for NFS: the choice was made hoping to maintain some simplicity by not using more than one HA solution at a time. However, it seems that this choice might have introduced more wrinkles than it's ironed out. An update on our situation: we have actually uncovered another clue since my last posting. One thing that this now known to be correlated *very* closely with instability in the NFS layer is running `mmcrsnapshot`. We had noticed that flapping happened like clockwork at midnight every night. This happens to be the same time at which our crontab was running the `mmcrsnapshot` so, as an experiment, we moved the snapshot to happen at 1a. After this change, the late-night flapping has moved to 1a and now happens reliably every night at that time. I saw a post on this list from 2013 stating that `mmcrsnapshot` was known to hang up the filesystem with race conditions that result in deadlocks and am wondering if that is still a problem with the `mmcrsnapthost` command. Running the snapshots had not been an obvious problem before, but seems to have become one since we deployed ~300 additional GPFS clients in a remote cluster configuration about a week ago. Can anybody comment on the safety of running `mmcrsnapshot` with a ~300 node remote cluster accessing the filesystem? Also, I would comment that this is not the only condition under which we see instability in the NFS layer. We continue to see intermittent instability through the day. The creation of a snapshot is simply the one well-correlated condition that we've discovered so far. Thanks so much to everyone for your help :) Stewart -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon Dec 7 17:53:20 2015 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 7 Dec 2015 17:53:20 +0000 Subject: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting In-Reply-To: <1449509014517.19529@iu.edu> References: <1449509014517.19529@iu.edu> Message-ID: <49D9766B-B369-4896-A466-DB60692FE08F@vanderbilt.edu> Hi Stewart, We had been running mmcrsnapshot with a ~700 node remote cluster accessing the filesystem for a couple of years now without issue. However, we haven?t been running it for a little while because there is a very serious bug in GPFS 4.1.0.x relating to snapshot *deletion*. There is an efix for it and we are in the process of rolling that out, but will not try to resume snapshots until both clusters are fully updated. HTH? Kevin On Dec 7, 2015, at 11:23 AM, Howard, Stewart Jameson > wrote: Hi All, Thanks to Doug and Kevin for the replies. In answer to Kevin's question about our choice of clustering solution for NFS: the choice was made hoping to maintain some simplicity by not using more than one HA solution at a time. However, it seems that this choice might have introduced more wrinkles than it's ironed out. An update on our situation: we have actually uncovered another clue since my last posting. One thing that this now known to be correlated *very* closely with instability in the NFS layer is running `mmcrsnapshot`. We had noticed that flapping happened like clockwork at midnight every night. This happens to be the same time at which our crontab was running the `mmcrsnapshot` so, as an experiment, we moved the snapshot to happen at 1a. After this change, the late-night flapping has moved to 1a and now happens reliably every night at that time. I saw a post on this list from 2013 stating that `mmcrsnapshot` was known to hang up the filesystem with race conditions that result in deadlocks and am wondering if that is still a problem with the `mmcrsnapthost` command. Running the snapshots had not been an obvious problem before, but seems to have become one since we deployed ~300 additional GPFS clients in a remote cluster configuration about a week ago. Can anybody comment on the safety of running `mmcrsnapshot` with a ~300 node remote cluster accessing the filesystem? Also, I would comment that this is not the only condition under which we see instability in the NFS layer. We continue to see intermittent instability through the day. The creation of a snapshot is simply the one well-correlated condition that we've discovered so far. Thanks so much to everyone for your help :) Stewart _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Douglas.Hughes at DEShawResearch.com Tue Dec 8 13:33:14 2015 From: Douglas.Hughes at DEShawResearch.com (Hughes, Doug) Date: Tue, 8 Dec 2015 13:33:14 +0000 Subject: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting In-Reply-To: <1449509014517.19529@iu.edu> References: <1449509014517.19529@iu.edu> Message-ID: When we started using GPFS, 3.3 time frame, we had a lot of issues with running different meta-applications at the same time.. snapshots, mmapplypolicy, mmdelsnapshot, etc. So we ended up using a locking mechanism around all of these to ensure that they were the only thing running at a given time. That mostly eliminated lock-ups, which were unfortunately common before then. I haven't tried removing it since. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Howard, Stewart Jameson Sent: Monday, December 07, 2015 12:24 PM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting Hi All, Thanks to Doug and Kevin for the replies. In answer to Kevin's question about our choice of clustering solution for NFS: the choice was made hoping to maintain some simplicity by not using more than one HA solution at a time. However, it seems that this choice might have introduced more wrinkles than it's ironed out. An update on our situation: we have actually uncovered another clue since my last posting. One thing that this now known to be correlated *very* closely with instability in the NFS layer is running `mmcrsnapshot`. We had noticed that flapping happened like clockwork at midnight every night. This happens to be the same time at which our crontab was running the `mmcrsnapshot` so, as an experiment, we moved the snapshot to happen at 1a. After this change, the late-night flapping has moved to 1a and now happens reliably every night at that time. I saw a post on this list from 2013 stating that `mmcrsnapshot` was known to hang up the filesystem with race conditions that result in deadlocks and am wondering if that is still a problem with the `mmcrsnapthost` command. Running the snapshots had not been an obvious problem before, but seems to have become one since we deployed ~300 additional GPFS clients in a remote cluster configuration about a week ago. Can anybody comment on the safety of running `mmcrsnapshot` with a ~300 node remote cluster accessing the filesystem? Also, I would comment that this is not the only condition under which we see instability in the NFS layer. We continue to see intermittent instability through the day. The creation of a snapshot is simply the one well-correlated condition that we've discovered so far. Thanks so much to everyone for your help :) Stewart -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Tue Dec 8 14:14:44 2015 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Tue, 8 Dec 2015 14:14:44 +0000 Subject: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting In-Reply-To: <49D9766B-B369-4896-A466-DB60692FE08F@vanderbilt.edu> References: <1449509014517.19529@iu.edu> <49D9766B-B369-4896-A466-DB60692FE08F@vanderbilt.edu> Message-ID: This may not be at all applicable to your situation, but we?re creating thousands of snapshots per day of many independent filesets. The same script(s) call mmdelsnapshot, too. We haven?t seen any particular issues with this. GPFS 3.5. I note with intereste your bug report below about 4.1.0.x though ? are you able to elaborate? From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 07 December 2015 17:53 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting Hi Stewart, We had been running mmcrsnapshot with a ~700 node remote cluster accessing the filesystem for a couple of years now without issue. However, we haven?t been running it for a little while because there is a very serious bug in GPFS 4.1.0.x relating to snapshot *deletion*. There is an efix for it and we are in the process of rolling that out, but will not try to resume snapshots until both clusters are fully updated. HTH? Kevin On Dec 7, 2015, at 11:23 AM, Howard, Stewart Jameson > wrote: Hi All, Thanks to Doug and Kevin for the replies. In answer to Kevin's question about our choice of clustering solution for NFS: the choice was made hoping to maintain some simplicity by not using more than one HA solution at a time. However, it seems that this choice might have introduced more wrinkles than it's ironed out. An update on our situation: we have actually uncovered another clue since my last posting. One thing that this now known to be correlated *very* closely with instability in the NFS layer is running `mmcrsnapshot`. We had noticed that flapping happened like clockwork at midnight every night. This happens to be the same time at which our crontab was running the `mmcrsnapshot` so, as an experiment, we moved the snapshot to happen at 1a. After this change, the late-night flapping has moved to 1a and now happens reliably every night at that time. I saw a post on this list from 2013 stating that `mmcrsnapshot` was known to hang up the filesystem with race conditions that result in deadlocks and am wondering if that is still a problem with the `mmcrsnapthost` command. Running the snapshots had not been an obvious problem before, but seems to have become one since we deployed ~300 additional GPFS clients in a remote cluster configuration about a week ago. Can anybody comment on the safety of running `mmcrsnapshot` with a ~300 node remote cluster accessing the filesystem? Also, I would comment that this is not the only condition under which we see instability in the NFS layer. We continue to see intermittent instability through the day. The creation of a snapshot is simply the one well-correlated condition that we've discovered so far. Thanks so much to everyone for your help :) Stewart _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Tue Dec 8 14:33:26 2015 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Tue, 8 Dec 2015 14:33:26 +0000 Subject: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting In-Reply-To: References: <1449509014517.19529@iu.edu> <49D9766B-B369-4896-A466-DB60692FE08F@vanderbilt.edu> Message-ID: <8E00046D-C44B-4A5C-905B-B56019C94097@vanderbilt.edu> Hi Richard, We went from GPFS 3.5.0.26 (where we also had zero problems with snapshot deletion) to GPFS 4.1.0.8 this past August and immediately hit the snapshot deletion bug (it?s some sort of race condition). It?s not pleasant ? to recover we had to unmount the affected filesystem from both clusters, which didn?t exactly make our researchers happy. But the good news is that there is an efix available for it if you?re on the 4.1.0 series and I am 99% sure that the bug has also been fixed in the last several PTF?s for the 4.1.1 series. That?s not the only bug we hit when going to 4.1.0.8 so my personal advice / opinion would be to bypass 4.1.0 and go straight to 4.1.1 or 4.2 when it comes out. We are planning on going to 4.2 as soon as feasible ? it looks like it?s much more stable plus has some new features (compression!) that we are very interested in. Again, my 2 cents worth. Kevin On Dec 8, 2015, at 8:14 AM, Sobey, Richard A > wrote: This may not be at all applicable to your situation, but we?re creating thousands of snapshots per day of many independent filesets. The same script(s) call mmdelsnapshot, too. We haven?t seen any particular issues with this. GPFS 3.5. I note with intereste your bug report below about 4.1.0.x though ? are you able to elaborate? From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 07 December 2015 17:53 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting Hi Stewart, We had been running mmcrsnapshot with a ~700 node remote cluster accessing the filesystem for a couple of years now without issue. However, we haven?t been running it for a little while because there is a very serious bug in GPFS 4.1.0.x relating to snapshot *deletion*. There is an efix for it and we are in the process of rolling that out, but will not try to resume snapshots until both clusters are fully updated. HTH? Kevin On Dec 7, 2015, at 11:23 AM, Howard, Stewart Jameson > wrote: Hi All, Thanks to Doug and Kevin for the replies. In answer to Kevin's question about our choice of clustering solution for NFS: the choice was made hoping to maintain some simplicity by not using more than one HA solution at a time. However, it seems that this choice might have introduced more wrinkles than it's ironed out. An update on our situation: we have actually uncovered another clue since my last posting. One thing that this now known to be correlated *very* closely with instability in the NFS layer is running `mmcrsnapshot`. We had noticed that flapping happened like clockwork at midnight every night. This happens to be the same time at which our crontab was running the `mmcrsnapshot` so, as an experiment, we moved the snapshot to happen at 1a. After this change, the late-night flapping has moved to 1a and now happens reliably every night at that time. I saw a post on this list from 2013 stating that `mmcrsnapshot` was known to hang up the filesystem with race conditions that result in deadlocks and am wondering if that is still a problem with the `mmcrsnapthost` command. Running the snapshots had not been an obvious problem before, but seems to have become one since we deployed ~300 additional GPFS clients in a remote cluster configuration about a week ago. Can anybody comment on the safety of running `mmcrsnapshot` with a ~300 node remote cluster accessing the filesystem? Also, I would comment that this is not the only condition under which we see instability in the NFS layer. We continue to see intermittent instability through the day. The creation of a snapshot is simply the one well-correlated condition that we've discovered so far. Thanks so much to everyone for your help :) Stewart _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Dec 8 14:56:56 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Tue, 8 Dec 2015 14:56:56 +0000 Subject: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting In-Reply-To: <8E00046D-C44B-4A5C-905B-B56019C94097@vanderbilt.edu> References: <1449509014517.19529@iu.edu> <49D9766B-B369-4896-A466-DB60692FE08F@vanderbilt.edu> , <8E00046D-C44B-4A5C-905B-B56019C94097@vanderbilt.edu> Message-ID: 4.2.0 is out. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Buterbaugh, Kevin L [Kevin.Buterbaugh at Vanderbilt.Edu] Sent: 08 December 2015 14:33 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting Hi Richard, We went from GPFS 3.5.0.26 (where we also had zero problems with snapshot deletion) to GPFS 4.1.0.8 this past August and immediately hit the snapshot deletion bug (it?s some sort of race condition). It?s not pleasant ? to recover we had to unmount the affected filesystem from both clusters, which didn?t exactly make our researchers happy. But the good news is that there is an efix available for it if you?re on the 4.1.0 series and I am 99% sure that the bug has also been fixed in the last several PTF?s for the 4.1.1 series. That?s not the only bug we hit when going to 4.1.0.8 so my personal advice / opinion would be to bypass 4.1.0 and go straight to 4.1.1 or 4.2 when it comes out. We are planning on going to 4.2 as soon as feasible ? it looks like it?s much more stable plus has some new features (compression!) that we are very interested in. Again, my 2 cents worth. Kevin On Dec 8, 2015, at 8:14 AM, Sobey, Richard A > wrote: This may not be at all applicable to your situation, but we?re creating thousands of snapshots per day of many independent filesets. The same script(s) call mmdelsnapshot, too. We haven?t seen any particular issues with this. GPFS 3.5. I note with intereste your bug report below about 4.1.0.x though ? are you able to elaborate? From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 07 December 2015 17:53 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting Hi Stewart, We had been running mmcrsnapshot with a ~700 node remote cluster accessing the filesystem for a couple of years now without issue. However, we haven?t been running it for a little while because there is a very serious bug in GPFS 4.1.0.x relating to snapshot *deletion*. There is an efix for it and we are in the process of rolling that out, but will not try to resume snapshots until both clusters are fully updated. HTH? Kevin On Dec 7, 2015, at 11:23 AM, Howard, Stewart Jameson > wrote: Hi All, Thanks to Doug and Kevin for the replies. In answer to Kevin's question about our choice of clustering solution for NFS: the choice was made hoping to maintain some simplicity by not using more than one HA solution at a time. However, it seems that this choice might have introduced more wrinkles than it's ironed out. An update on our situation: we have actually uncovered another clue since my last posting. One thing that this now known to be correlated *very* closely with instability in the NFS layer is running `mmcrsnapshot`. We had noticed that flapping happened like clockwork at midnight every night. This happens to be the same time at which our crontab was running the `mmcrsnapshot` so, as an experiment, we moved the snapshot to happen at 1a. After this change, the late-night flapping has moved to 1a and now happens reliably every night at that time. I saw a post on this list from 2013 stating that `mmcrsnapshot` was known to hang up the filesystem with race conditions that result in deadlocks and am wondering if that is still a problem with the `mmcrsnapthost` command. Running the snapshots had not been an obvious problem before, but seems to have become one since we deployed ~300 additional GPFS clients in a remote cluster configuration about a week ago. Can anybody comment on the safety of running `mmcrsnapshot` with a ~300 node remote cluster accessing the filesystem? Also, I would comment that this is not the only condition under which we see instability in the NFS layer. We continue to see intermittent instability through the day. The creation of a snapshot is simply the one well-correlated condition that we've discovered so far. Thanks so much to everyone for your help :) Stewart _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 From sjhoward at iu.edu Tue Dec 8 20:19:10 2015 From: sjhoward at iu.edu (Howard, Stewart Jameson) Date: Tue, 8 Dec 2015 20:19:10 +0000 Subject: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting In-Reply-To: References: <1449509014517.19529@iu.edu> <49D9766B-B369-4896-A466-DB60692FE08F@vanderbilt.edu> , <8E00046D-C44B-4A5C-905B-B56019C94097@vanderbilt.edu>, Message-ID: <1449605949971.76189@iu.edu> Hi All, An update on this. As events have unfolded, we have noticed a new symptom (cause?) that correlates very well, in time, with the instability we've been seeing on our protocol nodes. Specifically, we are seeing three nodes among the remote-cluster clients that were recently deployed that are getting repeatedly expelled from the cluster and then recovered. The expulsion-recovery cycles seem to go in fits and starts. They usually last about 20 to 30 minutes and will involve one, two, or even three of these nodes getting expelled and then rejoining, sometimes as many as ten or twelve times before things calm down. We're not sure if these expulsions are *causing* the troubles that we're having, but the fact that seem to coincide so well seems very suspicious. Also, during one of these events yesterday, I myself saw a `cp` operation wait forever to start during a time period that later, from logs, appeared to be a expulsion-recovery cycle for one of these nodes. Currently, we're investigating: 1) Problems with networking hardware between our home cluster and these remote-cluster nodes. 2) Misconfiguration of those nodes that breaks connectivity somehow. 3) Load or resource depletion on the problem nodes that may cause them to be unresponsive. On the CTDB front, we've increased CTDB's tolerance for unresponsiveness in the filesystem and hope that will at least keep the front end from going crazy when the filesystem becomes unresponsive. Has anybody seen a cluster suffer so badly from membership-thrashing by remote-cluster nodes? Is there a way to "blacklist" nodes that don't play nicely until they can be fixed? Any suggestions of conditions that might cause repeated expulsions? Thanks so much for your help! Stewart ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson (Research Computing - IT Services) Sent: Tuesday, December 8, 2015 9:56 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting 4.2.0 is out. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Buterbaugh, Kevin L [Kevin.Buterbaugh at Vanderbilt.Edu] Sent: 08 December 2015 14:33 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting Hi Richard, We went from GPFS 3.5.0.26 (where we also had zero problems with snapshot deletion) to GPFS 4.1.0.8 this past August and immediately hit the snapshot deletion bug (it?s some sort of race condition). It?s not pleasant ? to recover we had to unmount the affected filesystem from both clusters, which didn?t exactly make our researchers happy. But the good news is that there is an efix available for it if you?re on the 4.1.0 series and I am 99% sure that the bug has also been fixed in the last several PTF?s for the 4.1.1 series. That?s not the only bug we hit when going to 4.1.0.8 so my personal advice / opinion would be to bypass 4.1.0 and go straight to 4.1.1 or 4.2 when it comes out. We are planning on going to 4.2 as soon as feasible ? it looks like it?s much more stable plus has some new features (compression!) that we are very interested in. Again, my 2 cents worth. Kevin On Dec 8, 2015, at 8:14 AM, Sobey, Richard A > wrote: This may not be at all applicable to your situation, but we?re creating thousands of snapshots per day of many independent filesets. The same script(s) call mmdelsnapshot, too. We haven?t seen any particular issues with this. GPFS 3.5. I note with intereste your bug report below about 4.1.0.x though ? are you able to elaborate? From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 07 December 2015 17:53 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting Hi Stewart, We had been running mmcrsnapshot with a ~700 node remote cluster accessing the filesystem for a couple of years now without issue. However, we haven?t been running it for a little while because there is a very serious bug in GPFS 4.1.0.x relating to snapshot *deletion*. There is an efix for it and we are in the process of rolling that out, but will not try to resume snapshots until both clusters are fully updated. HTH? Kevin On Dec 7, 2015, at 11:23 AM, Howard, Stewart Jameson > wrote: Hi All, Thanks to Doug and Kevin for the replies. In answer to Kevin's question about our choice of clustering solution for NFS: the choice was made hoping to maintain some simplicity by not using more than one HA solution at a time. However, it seems that this choice might have introduced more wrinkles than it's ironed out. An update on our situation: we have actually uncovered another clue since my last posting. One thing that this now known to be correlated *very* closely with instability in the NFS layer is running `mmcrsnapshot`. We had noticed that flapping happened like clockwork at midnight every night. This happens to be the same time at which our crontab was running the `mmcrsnapshot` so, as an experiment, we moved the snapshot to happen at 1a. After this change, the late-night flapping has moved to 1a and now happens reliably every night at that time. I saw a post on this list from 2013 stating that `mmcrsnapshot` was known to hang up the filesystem with race conditions that result in deadlocks and am wondering if that is still a problem with the `mmcrsnapthost` command. Running the snapshots had not been an obvious problem before, but seems to have become one since we deployed ~300 additional GPFS clients in a remote cluster configuration about a week ago. Can anybody comment on the safety of running `mmcrsnapshot` with a ~300 node remote cluster accessing the filesystem? Also, I would comment that this is not the only condition under which we see instability in the NFS layer. We continue to see intermittent instability through the day. The creation of a snapshot is simply the one well-correlated condition that we've discovered so far. Thanks so much to everyone for your help :) Stewart _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Paul.Sanchez at deshaw.com Tue Dec 8 22:00:05 2015 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Tue, 8 Dec 2015 22:00:05 +0000 Subject: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting In-Reply-To: <1449605949971.76189@iu.edu> References: <1449509014517.19529@iu.edu> <49D9766B-B369-4896-A466-DB60692FE08F@vanderbilt.edu> , <8E00046D-C44B-4A5C-905B-B56019C94097@vanderbilt.edu>, <1449605949971.76189@iu.edu> Message-ID: <858195fae73441fc9e65085c1d32071f@mbxtoa1.winmail.deshaw.com> One similar incident I've seen is if a filesystem is configured with too low a "-n numNodes" value for the number of nodes actually mounting (or remote mounting) the filesystem, then the cluster may become overloaded, lease renewals may be affected, and node expels may occur. I'm sure we'll all be interested in a recap of what you actually discover here, when the problem is identified. Thx Paul -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Howard, Stewart Jameson Sent: Tuesday, December 08, 2015 3:19 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting Hi All, An update on this. As events have unfolded, we have noticed a new symptom (cause?) that correlates very well, in time, with the instability we've been seeing on our protocol nodes. Specifically, we are seeing three nodes among the remote-cluster clients that were recently deployed that are getting repeatedly expelled from the cluster and then recovered. The expulsion-recovery cycles seem to go in fits and starts. They usually last about 20 to 30 minutes and will involve one, two, or even three of these nodes getting expelled and then rejoining, sometimes as many as ten or twelve times before things calm down. We're not sure if these expulsions are *causing* the troubles that we're having, but the fact that seem to coincide so well seems very suspicious. Also, during one of these events yesterday, I myself saw a `cp` operation wait forever to start during a time period that later, from logs, appeared to be a expulsion-recovery cycle for one of these nodes. Currently, we're investigating: 1) Problems with networking hardware between our home cluster and these remote-cluster nodes. 2) Misconfiguration of those nodes that breaks connectivity somehow. 3) Load or resource depletion on the problem nodes that may cause them to be unresponsive. On the CTDB front, we've increased CTDB's tolerance for unresponsiveness in the filesystem and hope that will at least keep the front end from going crazy when the filesystem becomes unresponsive. Has anybody seen a cluster suffer so badly from membership-thrashing by remote-cluster nodes? Is there a way to "blacklist" nodes that don't play nicely until they can be fixed? Any suggestions of conditions that might cause repeated expulsions? Thanks so much for your help! Stewart ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson (Research Computing - IT Services) Sent: Tuesday, December 8, 2015 9:56 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting 4.2.0 is out. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Buterbaugh, Kevin L [Kevin.Buterbaugh at Vanderbilt.Edu] Sent: 08 December 2015 14:33 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting Hi Richard, We went from GPFS 3.5.0.26 (where we also had zero problems with snapshot deletion) to GPFS 4.1.0.8 this past August and immediately hit the snapshot deletion bug (it's some sort of race condition). It's not pleasant ... to recover we had to unmount the affected filesystem from both clusters, which didn't exactly make our researchers happy. But the good news is that there is an efix available for it if you're on the 4.1.0 series and I am 99% sure that the bug has also been fixed in the last several PTF's for the 4.1.1 series. That's not the only bug we hit when going to 4.1.0.8 so my personal advice / opinion would be to bypass 4.1.0 and go straight to 4.1.1 or 4.2 when it comes out. We are planning on going to 4.2 as soon as feasible ... it looks like it's much more stable plus has some new features (compression!) that we are very interested in. Again, my 2 cents worth. Kevin On Dec 8, 2015, at 8:14 AM, Sobey, Richard A > wrote: This may not be at all applicable to your situation, but we're creating thousands of snapshots per day of many independent filesets. The same script(s) call mmdelsnapshot, too. We haven't seen any particular issues with this. GPFS 3.5. I note with intereste your bug report below about 4.1.0.x though - are you able to elaborate? From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 07 December 2015 17:53 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting Hi Stewart, We had been running mmcrsnapshot with a ~700 node remote cluster accessing the filesystem for a couple of years now without issue. However, we haven't been running it for a little while because there is a very serious bug in GPFS 4.1.0.x relating to snapshot *deletion*. There is an efix for it and we are in the process of rolling that out, but will not try to resume snapshots until both clusters are fully updated. HTH... Kevin On Dec 7, 2015, at 11:23 AM, Howard, Stewart Jameson > wrote: Hi All, Thanks to Doug and Kevin for the replies. In answer to Kevin's question about our choice of clustering solution for NFS: the choice was made hoping to maintain some simplicity by not using more than one HA solution at a time. However, it seems that this choice might have introduced more wrinkles than it's ironed out. An update on our situation: we have actually uncovered another clue since my last posting. One thing that this now known to be correlated *very* closely with instability in the NFS layer is running `mmcrsnapshot`. We had noticed that flapping happened like clockwork at midnight every night. This happens to be the same time at which our crontab was running the `mmcrsnapshot` so, as an experiment, we moved the snapshot to happen at 1a. After this change, the late-night flapping has moved to 1a and now happens reliably every night at that time. I saw a post on this list from 2013 stating that `mmcrsnapshot` was known to hang up the filesystem with race conditions that result in deadlocks and am wondering if that is still a problem with the `mmcrsnapthost` command. Running the snapshots had not been an obvious problem before, but seems to have become one since we deployed ~300 additional GPFS clients in a remote cluster configuration about a week ago. Can anybody comment on the safety of running `mmcrsnapshot` with a ~300 node remote cluster accessing the filesystem? Also, I would comment that this is not the only condition under which we see instability in the NFS layer. We continue to see intermittent instability through the day. The creation of a snapshot is simply the one well-correlated condition that we've discovered so far. Thanks so much to everyone for your help :) Stewart _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss - Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss - Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From chair at spectrumscale.org Wed Dec 9 21:59:33 2015 From: chair at spectrumscale.org (GPFS UG Chair (Simon Thompson)) Date: Wed, 09 Dec 2015 21:59:33 +0000 Subject: [gpfsug-discuss] CIUK User Group Meeting Message-ID: The slides from the three talks are now up on the UG website at: www.spectrumscale.org/presentations/ There's also a blog post from my PoV on the site as well. Thanks again to Patrick, Cameron, Vic and Marc for speaking. Simon On 07/12/2015, 10:33, "gpfsug-discuss-bounces at spectrumscale.org on behalf of GPFS UG Chair (Simon Thompson)" wrote: >I had a question about when the UG meeting is at Comppuing Insight this >week it. > >Its tomorrow morning (8th), at 10am. > >Just a reminder that you need to register for CIUK if you are coming >along: >http://www.stfc.ac.uk/news-events-and-publications/events/computing-insigh >t >-uk-2015/ > >Agenda for the meeting is: > * IBM - Introducing Spectrum Scale 4.2, GUI, QoS, 4.2.1 onwards > * Vic Cornell (DDN) - How to ruin a perfectly good GPFS file system > * Marc Roskow (Seagate) > >There is also time for discussion etc. > >See you tomorrow if you are coming along! > >Simon > > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Greg.Lehmann at csiro.au Wed Dec 9 23:55:46 2015 From: Greg.Lehmann at csiro.au (Greg.Lehmann at csiro.au) Date: Wed, 9 Dec 2015 23:55:46 +0000 Subject: [gpfsug-discuss] CIUK User Group Meeting In-Reply-To: References: Message-ID: Just a small point but the link text for the first talk says 4.1 when it is actually about 4.2. Also do you have Cameron's email address? I have some feedback on the documentation issue. -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of GPFS UG Chair (Simon Thompson) Sent: Thursday, 10 December 2015 8:00 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] CIUK User Group Meeting The slides from the three talks are now up on the UG website at: www.spectrumscale.org/presentations/ There's also a blog post from my PoV on the site as well. Thanks again to Patrick, Cameron, Vic and Marc for speaking. Simon On 07/12/2015, 10:33, "gpfsug-discuss-bounces at spectrumscale.org on behalf of GPFS UG Chair (Simon Thompson)" wrote: >I had a question about when the UG meeting is at Comppuing Insight this >week it. > >Its tomorrow morning (8th), at 10am. > >Just a reminder that you need to register for CIUK if you are coming >along: >http://www.stfc.ac.uk/news-events-and-publications/events/computing-ins >igh >t >-uk-2015/ > >Agenda for the meeting is: > * IBM - Introducing Spectrum Scale 4.2, GUI, QoS, 4.2.1 onwards > * Vic Cornell (DDN) - How to ruin a perfectly good GPFS file system > * Marc Roskow (Seagate) > >There is also time for discussion etc. > >See you tomorrow if you are coming along! > >Simon > > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Luke.Raimbach at crick.ac.uk Thu Dec 10 13:46:14 2015 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Thu, 10 Dec 2015 13:46:14 +0000 Subject: [gpfsug-discuss] Restriping GPFS Metadata Message-ID: Hi All, Some years ago I remember adding more metadata SSDs to a GPFS 3.5 file system (system pool with MD only disks) and then trying to restripe the metadata. It didn?t work and I asked about it, only to discover that metadata doesn't get restriped. Has this changed? Does it matter if MD is not restriped? I ask because I'll probably want to add more MD SSDs to a new system in the near future. Cheers, Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. From jonathan at buzzard.me.uk Thu Dec 10 13:56:52 2015 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Thu, 10 Dec 2015 13:56:52 +0000 Subject: [gpfsug-discuss] Restriping GPFS Metadata In-Reply-To: References: Message-ID: <1449755812.4059.38.camel@buzzard.phy.strath.ac.uk> On Thu, 2015-12-10 at 13:46 +0000, Luke Raimbach wrote: > Hi All, > > Some years ago I remember adding more metadata SSDs to a GPFS 3.5 file > system (system pool with MD only disks) and then trying to restripe the > metadata. > > It didn?t work and I asked about it, only to discover that metadata > doesn't get restriped. > > Has this changed? Does it matter if MD is not restriped? I ask because > I'll probably want to add more MD SSDs to a new system in the near > future. > Hum, that is I believe inaccurate. Metadata does get restriped in at least the case where you move it from one set of disks to another set of disks. It should also get restriped it you change the replication factor. However I am pretty sure that it gets restriped without the necessity to move it from one set of disks to another as well. The caveat is that you cannot restripe *just* the metadata. You have to restripe the whole file system... Or at least that used to be the case and maybe why you have the idea the metadata didn't get restriped. Whether this has changed in 4.x is another matter that perhaps someone from IBM can answer. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From Kevin.Buterbaugh at Vanderbilt.Edu Thu Dec 10 14:04:29 2015 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 10 Dec 2015 14:04:29 +0000 Subject: [gpfsug-discuss] Restriping GPFS Metadata In-Reply-To: <1449755812.4059.38.camel@buzzard.phy.strath.ac.uk> References: <1449755812.4059.38.camel@buzzard.phy.strath.ac.uk> Message-ID: <5D0BD493-42DC-4AC6-8851-6AD2A35891ED@vanderbilt.edu> Hi All, We recently moved all of our metadata off of spinning hard drives to SSD?s via restriping. You can restripe only a specific pool with the ?-P? option, so if you have only your metadata disks in the system pool then you can definitely do this? Kevin, who does not work for IBM? ;-) > On Dec 10, 2015, at 7:56 AM, Jonathan Buzzard wrote: > > On Thu, 2015-12-10 at 13:46 +0000, Luke Raimbach wrote: >> Hi All, >> >> Some years ago I remember adding more metadata SSDs to a GPFS 3.5 file >> system (system pool with MD only disks) and then trying to restripe the >> metadata. >> >> It didn?t work and I asked about it, only to discover that metadata >> doesn't get restriped. >> >> Has this changed? Does it matter if MD is not restriped? I ask because >> I'll probably want to add more MD SSDs to a new system in the near >> future. >> > > Hum, that is I believe inaccurate. Metadata does get restriped in at > least the case where you move it from one set of disks to another set of > disks. It should also get restriped it you change the replication > factor. However I am pretty sure that it gets restriped without the > necessity to move it from one set of disks to another as well. > > The caveat is that you cannot restripe *just* the metadata. You have to > restripe the whole file system... Or at least that used to be the case > and maybe why you have the idea the metadata didn't get restriped. > Whether this has changed in 4.x is another matter that perhaps someone > from IBM can answer. > > JAB. > > -- > Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk > Fife, United Kingdom. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Luke.Raimbach at crick.ac.uk Thu Dec 10 14:05:00 2015 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Thu, 10 Dec 2015 14:05:00 +0000 Subject: [gpfsug-discuss] Restriping GPFS Metadata In-Reply-To: <1449755812.4059.38.camel@buzzard.phy.strath.ac.uk> References: <1449755812.4059.38.camel@buzzard.phy.strath.ac.uk> Message-ID: > On Thu, 2015-12-10 at 13:46 +0000, Luke Raimbach wrote: > > Hi All, > > > > Some years ago I remember adding more metadata SSDs to a GPFS 3.5 file > > system (system pool with MD only disks) and then trying to restripe > > the metadata. > > > > It didn?t work and I asked about it, only to discover that metadata > > doesn't get restriped. > > > > Has this changed? Does it matter if MD is not restriped? I ask because > > I'll probably want to add more MD SSDs to a new system in the near > > future. > > > > Hum, that is I believe inaccurate. Metadata does get restriped in at least the > case where you move it from one set of disks to another set of disks. It should > also get restriped it you change the replication factor. However I am pretty sure > that it gets restriped without the necessity to move it from one set of disks to > another as well. > > The caveat is that you cannot restripe *just* the metadata. You have to restripe > the whole file system... Or at least that used to be the case and maybe why you > have the idea the metadata didn't get restriped. > Whether this has changed in 4.x is another matter that perhaps someone from > IBM can answer. Ah yes I remember now. I was wanting to rebalance the disk usage (rather than move on to new disks - I knew this would work obviously). You're right in that I had to restripe the whole file system and this would have taken forever, so I just didn't bother! Cheers Luke. The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. From jonathan at buzzard.me.uk Thu Dec 10 15:05:16 2015 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Thu, 10 Dec 2015 15:05:16 +0000 Subject: [gpfsug-discuss] Restriping GPFS Metadata In-Reply-To: <5D0BD493-42DC-4AC6-8851-6AD2A35891ED@vanderbilt.edu> References: <1449755812.4059.38.camel@buzzard.phy.strath.ac.uk> <5D0BD493-42DC-4AC6-8851-6AD2A35891ED@vanderbilt.edu> Message-ID: <1449759916.4059.44.camel@buzzard.phy.strath.ac.uk> On Thu, 2015-12-10 at 14:04 +0000, Buterbaugh, Kevin L wrote: > Hi All, > > We recently moved all of our metadata off of spinning hard drives to > SSD?s via restriping. You can restripe only a specific pool with the > ?-P? option, so if you have only your metadata disks in the system pool > then you can definitely do this? > You are right the restriping can be done by pool, so this is I guess another argument for having only metadata disks in the system pool, with an appropriate policy so that data hits other pools. Typically/often the system pool has both metadata and data disks in, and what Luke was referring to is the fact you can't restripe just the metadata disks in a pool, so what should have been a relatively quick restriping of a few hundred GB of metadata all of a sudden becomes depending on the size of the data disks in your system pool a much longer operation; potentially to the point where you don't bother. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From sjhoward at iu.edu Thu Dec 10 16:14:21 2015 From: sjhoward at iu.edu (Howard, Stewart Jameson) Date: Thu, 10 Dec 2015 16:14:21 +0000 Subject: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting In-Reply-To: <858195fae73441fc9e65085c1d32071f@mbxtoa1.winmail.deshaw.com> References: <1449509014517.19529@iu.edu> <49D9766B-B369-4896-A466-DB60692FE08F@vanderbilt.edu> , <8E00046D-C44B-4A5C-905B-B56019C94097@vanderbilt.edu>, <1449605949971.76189@iu.edu>, <858195fae73441fc9e65085c1d32071f@mbxtoa1.winmail.deshaw.com> Message-ID: <1449764061478.4880@iu.edu> Hi Again Everybody, Ok, so we got resolution on this. Recall that I had said we'd just added ~300 remote cluster GPFS clients and started having problems with CTDB the very same day... Among those clients, there were three that had misconfigured firewalls, such that they could reach our home cluster nodes on port 1191, but our home cluster nodes could *not* reach them on 1191 *or* on any of the ephemeral ports. This situation played absolute *havoc* with the stability of the filesystem. From what we could tell, it seemed that these three nodes would establish a harmless-looking connection and mount the filesystem. However, as soon as one of them acquired a resource (lock token or similar?) that the home cluster needed back...watch out! In the GPFS logs on our side, we would see messages asking for the expulsion of these nodes about 4 - 5 times per day and a ton of messages about timeouts when trying to contact them. These nodes would then re-join the cluster, since they could contact us, and this would entail repeated "delay N seconds for recovery" events. During these recovery periods, the filesystem would become unresponsive for up to 60 or more seconds at a time. This seemed to cause various NFS processes to fall on their faces. Sometimes, the victim would be nfsd itself; other times, it would be rpc.mountd. CTDB would then come check on NFS, find that it was floundering, and start a recovery run. To make things worse, at those very times the CTDB shared accounting files would *also* be unavailable since they reside on the same GPFS filesystem that they are serving (thanks to Doug for pointing out the flaw in this design and we're currently looking for an alternate home for these shared files). This all added up to a *lot* of flapping, in NFS as well as with CTDB itself. However, the problems with CTDB/NFS were a *symptom* in this case, not a root cause. The *cause* was the imperfect connectivity of just three out of 300 new clients. I think the moral of the story here is this: if you're adding remote cluster clients, make *absolutely* sure that all communications work going both ways between your home cluster and *every* new client. If there is asymmetrical connectivity such as we had last week, you are in for one wild ride. I would also point out that the flapping did not stop until we resolved connectivity for *all* of the clients, so remember that even having one single half-connected client is poisonous to your stability. Thanks to everybody for all of your help! Unless something changes, I'm declaring that our site is out of the woods on this one :) Stewart ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Sanchez, Paul Sent: Tuesday, December 8, 2015 5:00 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting One similar incident I've seen is if a filesystem is configured with too low a "-n numNodes" value for the number of nodes actually mounting (or remote mounting) the filesystem, then the cluster may become overloaded, lease renewals may be affected, and node expels may occur. I'm sure we'll all be interested in a recap of what you actually discover here, when the problem is identified. Thx Paul -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Howard, Stewart Jameson Sent: Tuesday, December 08, 2015 3:19 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting Hi All, An update on this. As events have unfolded, we have noticed a new symptom (cause?) that correlates very well, in time, with the instability we've been seeing on our protocol nodes. Specifically, we are seeing three nodes among the remote-cluster clients that were recently deployed that are getting repeatedly expelled from the cluster and then recovered. The expulsion-recovery cycles seem to go in fits and starts. They usually last about 20 to 30 minutes and will involve one, two, or even three of these nodes getting expelled and then rejoining, sometimes as many as ten or twelve times before things calm down. We're not sure if these expulsions are *causing* the troubles that we're having, but the fact that seem to coincide so well seems very suspicious. Also, during one of these events yesterday, I myself saw a `cp` operation wait forever to start during a time period that later, from logs, appeared to be a expulsion-recovery cycle for one of these nodes. Currently, we're investigating: 1) Problems with networking hardware between our home cluster and these remote-cluster nodes. 2) Misconfiguration of those nodes that breaks connectivity somehow. 3) Load or resource depletion on the problem nodes that may cause them to be unresponsive. On the CTDB front, we've increased CTDB's tolerance for unresponsiveness in the filesystem and hope that will at least keep the front end from going crazy when the filesystem becomes unresponsive. Has anybody seen a cluster suffer so badly from membership-thrashing by remote-cluster nodes? Is there a way to "blacklist" nodes that don't play nicely until they can be fixed? Any suggestions of conditions that might cause repeated expulsions? Thanks so much for your help! Stewart ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson (Research Computing - IT Services) Sent: Tuesday, December 8, 2015 9:56 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting 4.2.0 is out. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Buterbaugh, Kevin L [Kevin.Buterbaugh at Vanderbilt.Edu] Sent: 08 December 2015 14:33 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting Hi Richard, We went from GPFS 3.5.0.26 (where we also had zero problems with snapshot deletion) to GPFS 4.1.0.8 this past August and immediately hit the snapshot deletion bug (it's some sort of race condition). It's not pleasant ... to recover we had to unmount the affected filesystem from both clusters, which didn't exactly make our researchers happy. But the good news is that there is an efix available for it if you're on the 4.1.0 series and I am 99% sure that the bug has also been fixed in the last several PTF's for the 4.1.1 series. That's not the only bug we hit when going to 4.1.0.8 so my personal advice / opinion would be to bypass 4.1.0 and go straight to 4.1.1 or 4.2 when it comes out. We are planning on going to 4.2 as soon as feasible ... it looks like it's much more stable plus has some new features (compression!) that we are very interested in. Again, my 2 cents worth. Kevin On Dec 8, 2015, at 8:14 AM, Sobey, Richard A > wrote: This may not be at all applicable to your situation, but we're creating thousands of snapshots per day of many independent filesets. The same script(s) call mmdelsnapshot, too. We haven't seen any particular issues with this. GPFS 3.5. I note with intereste your bug report below about 4.1.0.x though - are you able to elaborate? From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 07 December 2015 17:53 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting Hi Stewart, We had been running mmcrsnapshot with a ~700 node remote cluster accessing the filesystem for a couple of years now without issue. However, we haven't been running it for a little while because there is a very serious bug in GPFS 4.1.0.x relating to snapshot *deletion*. There is an efix for it and we are in the process of rolling that out, but will not try to resume snapshots until both clusters are fully updated. HTH... Kevin On Dec 7, 2015, at 11:23 AM, Howard, Stewart Jameson > wrote: Hi All, Thanks to Doug and Kevin for the replies. In answer to Kevin's question about our choice of clustering solution for NFS: the choice was made hoping to maintain some simplicity by not using more than one HA solution at a time. However, it seems that this choice might have introduced more wrinkles than it's ironed out. An update on our situation: we have actually uncovered another clue since my last posting. One thing that this now known to be correlated *very* closely with instability in the NFS layer is running `mmcrsnapshot`. We had noticed that flapping happened like clockwork at midnight every night. This happens to be the same time at which our crontab was running the `mmcrsnapshot` so, as an experiment, we moved the snapshot to happen at 1a. After this change, the late-night flapping has moved to 1a and now happens reliably every night at that time. I saw a post on this list from 2013 stating that `mmcrsnapshot` was known to hang up the filesystem with race conditions that result in deadlocks and am wondering if that is still a problem with the `mmcrsnapthost` command. Running the snapshots had not been an obvious problem before, but seems to have become one since we deployed ~300 additional GPFS clients in a remote cluster configuration about a week ago. Can anybody comment on the safety of running `mmcrsnapshot` with a ~300 node remote cluster accessing the filesystem? Also, I would comment that this is not the only condition under which we see instability in the NFS layer. We continue to see intermittent instability through the day. The creation of a snapshot is simply the one well-correlated condition that we've discovered so far. Thanks so much to everyone for your help :) Stewart _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss - Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss - Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From makaplan at us.ibm.com Thu Dec 10 17:26:59 2015 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 10 Dec 2015 12:26:59 -0500 Subject: [gpfsug-discuss] Restriping GPFS Metadata In-Reply-To: <1449759916.4059.44.camel@buzzard.phy.strath.ac.uk> References: <1449755812.4059.38.camel@buzzard.phy.strath.ac.uk> <5D0BD493-42DC-4AC6-8851-6AD2A35891ED@vanderbilt.edu> <1449759916.4059.44.camel@buzzard.phy.strath.ac.uk> Message-ID: <201512101727.tBAHRDRB011715@d03av01.boulder.ibm.com> The concept of "metadata" disks pre-dates the addition of POOLs. Correct use of policy SET POOL rules and assignment of disks or SSDs to pools makes metadata only disks a redundant, potentially confusing concept that the new comer to GPFS (ahem... Spectrum Scale FS) can ignore. Metadata always goes to system pool. Other file data can be directed or MIGRATed to any pool you like, using SET POOL or MIGRATE rules. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 21994 bytes Desc: not available URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Dec 10 17:38:54 2015 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 10 Dec 2015 17:38:54 +0000 Subject: [gpfsug-discuss] Restriping GPFS Metadata In-Reply-To: <201512101727.tBAHRDRB011715@d03av01.boulder.ibm.com> References: <1449755812.4059.38.camel@buzzard.phy.strath.ac.uk> <5D0BD493-42DC-4AC6-8851-6AD2A35891ED@vanderbilt.edu> <1449759916.4059.44.camel@buzzard.phy.strath.ac.uk> <201512101727.tBAHRDRB011715@d03av01.boulder.ibm.com> Message-ID: Hi Marc, Unfortunately, I find the first paragraph of your response ? confusing. :-O I understand your 1st sentence ? but isn?t one of the benefits of having only metadata disks in the system pool the ability to then have separate block sizes for your metadata and your data? If so, isn?t the simplest way to do that to have metadataOnly and dataOnly disks? I recognize that they may not be the *only* way to accomplish that? Kevin On Dec 10, 2015, at 11:26 AM, Marc A Kaplan > wrote: The concept of "metadata" disks pre-dates the addition of POOLs. Correct use of policy SET POOL rules and assignment of disks or SSDs to pools makes metadata only disks a redundant, potentially confusing concept that the new comer to GPFS (ahem... Spectrum Scale FS) can ignore. Metadata always goes to system pool. Other file data can be directed or MIGRATed to any pool you like, using SET POOL or MIGRATE rules. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Dec 10 19:42:59 2015 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 10 Dec 2015 14:42:59 -0500 Subject: [gpfsug-discuss] Restriping GPFS Metadata In-Reply-To: References: <1449755812.4059.38.camel@buzzard.phy.strath.ac.uk> <5D0BD493-42DC-4AC6-8851-6AD2A35891ED@vanderbilt.edu> <1449759916.4059.44.camel@buzzard.phy.strath.ac.uk> <201512101727.tBAHRDRB011715@d03av01.boulder.ibm.com> Message-ID: <201512101943.tBAJh56G015713@d01av03.pok.ibm.com> I may be off somewhat, but my recollection is that support for metadata-blocksize != (other)blocksize came after POOLs. And the doc for mmcrfs seems to indicate that if you want to specify a metadata-blocksize != blocksize, then system pool must be comprised of all metadataonly disks. Which is consistent with my understanding that all the disks in a given pool have equal blocksize. So my point was, if you want to segregate data from metadata, think only about pools first. Then worry about blocksizes and metadataonly disks. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Greg.Lehmann at csiro.au Thu Dec 10 23:05:21 2015 From: Greg.Lehmann at csiro.au (Greg.Lehmann at csiro.au) Date: Thu, 10 Dec 2015 23:05:21 +0000 Subject: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting In-Reply-To: <1449764061478.4880@iu.edu> References: <1449509014517.19529@iu.edu> <49D9766B-B369-4896-A466-DB60692FE08F@vanderbilt.edu> , <8E00046D-C44B-4A5C-905B-B56019C94097@vanderbilt.edu>, <1449605949971.76189@iu.edu>, <858195fae73441fc9e65085c1d32071f@mbxtoa1.winmail.deshaw.com> <1449764061478.4880@iu.edu> Message-ID: Should the process of connecting the clusters automatically test out the connectivity both ways for us? Feature request for a future version? -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Howard, Stewart Jameson Sent: Friday, 11 December 2015 2:14 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting Hi Again Everybody, Ok, so we got resolution on this. Recall that I had said we'd just added ~300 remote cluster GPFS clients and started having problems with CTDB the very same day... Among those clients, there were three that had misconfigured firewalls, such that they could reach our home cluster nodes on port 1191, but our home cluster nodes could *not* reach them on 1191 *or* on any of the ephemeral ports. This situation played absolute *havoc* with the stability of the filesystem. From what we could tell, it seemed that these three nodes would establish a harmless-looking connection and mount the filesystem. However, as soon as one of them acquired a resource (lock token or similar?) that the home cluster needed back...watch out! In the GPFS logs on our side, we would see messages asking for the expulsion of these nodes about 4 - 5 times per day and a ton of messages about timeouts when trying to contact them. These nodes would then re-join the cluster, since they could contact us, and this would entail repeated "delay N seconds for recovery" events. During these recovery periods, the filesystem would become unresponsive for up to 60 or more seconds at a time. This seemed to cause various NFS processes to fall on their faces. Sometimes, the victim would be nfsd itself; other times, it would be rpc.mountd. CTDB would then come check on NFS, find that it was floundering, and start a recovery run. To make things worse, at those very times the CTDB shared accounting files would *also* be unavailable since they reside on the same GPFS filesystem that they are serving (thanks to Doug for pointing out the flaw in this design and we're currently looking for an alternate home for these shared files). This all added up to a *lot* of flapping, in NFS as well as with CTDB itself. However, the problems with CTDB/NFS were a *symptom* in this case, not a root cause. The *cause* was the imperfect connectivity of just three out of 300 new clients. I think the moral of the story here is this: if you're adding remote cluster clients, make *absolutely* sure that all communications work going both ways between your home cluster and *every* new client. If there is asymmetrical connectivity such as we had last week, you are in for one wild ride. I would also point out that the flapping did not stop until we resolved connectivity for *all* of the clients, so remember that even having one single half-connected client is poisonous to your stability. Thanks to everybody for all of your help! Unless something changes, I'm declaring that our site is out of the woods on this one :) Stewart ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Sanchez, Paul Sent: Tuesday, December 8, 2015 5:00 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting One similar incident I've seen is if a filesystem is configured with too low a "-n numNodes" value for the number of nodes actually mounting (or remote mounting) the filesystem, then the cluster may become overloaded, lease renewals may be affected, and node expels may occur. I'm sure we'll all be interested in a recap of what you actually discover here, when the problem is identified. Thx Paul -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Howard, Stewart Jameson Sent: Tuesday, December 08, 2015 3:19 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting Hi All, An update on this. As events have unfolded, we have noticed a new symptom (cause?) that correlates very well, in time, with the instability we've been seeing on our protocol nodes. Specifically, we are seeing three nodes among the remote-cluster clients that were recently deployed that are getting repeatedly expelled from the cluster and then recovered. The expulsion-recovery cycles seem to go in fits and starts. They usually last about 20 to 30 minutes and will involve one, two, or even three of these nodes getting expelled and then rejoining, sometimes as many as ten or twelve times before things calm down. We're not sure if these expulsions are *causing* the troubles that we're having, but the fact that seem to coincide so well seems very suspicious. Also, during one of these events yesterday, I myself saw a `cp` operation wait forever to start during a time period that later, from logs, appeared to be a expulsion-recovery cycle for one of these nodes. Currently, we're investigating: 1) Problems with networking hardware between our home cluster and these remote-cluster nodes. 2) Misconfiguration of those nodes that breaks connectivity somehow. 3) Load or resource depletion on the problem nodes that may cause them to be unresponsive. On the CTDB front, we've increased CTDB's tolerance for unresponsiveness in the filesystem and hope that will at least keep the front end from going crazy when the filesystem becomes unresponsive. Has anybody seen a cluster suffer so badly from membership-thrashing by remote-cluster nodes? Is there a way to "blacklist" nodes that don't play nicely until they can be fixed? Any suggestions of conditions that might cause repeated expulsions? Thanks so much for your help! Stewart ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson (Research Computing - IT Services) Sent: Tuesday, December 8, 2015 9:56 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting 4.2.0 is out. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Buterbaugh, Kevin L [Kevin.Buterbaugh at Vanderbilt.Edu] Sent: 08 December 2015 14:33 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting Hi Richard, We went from GPFS 3.5.0.26 (where we also had zero problems with snapshot deletion) to GPFS 4.1.0.8 this past August and immediately hit the snapshot deletion bug (it's some sort of race condition). It's not pleasant ... to recover we had to unmount the affected filesystem from both clusters, which didn't exactly make our researchers happy. But the good news is that there is an efix available for it if you're on the 4.1.0 series and I am 99% sure that the bug has also been fixed in the last several PTF's for the 4.1.1 series. That's not the only bug we hit when going to 4.1.0.8 so my personal advice / opinion would be to bypass 4.1.0 and go straight to 4.1.1 or 4.2 when it comes out. We are planning on going to 4.2 as soon as feasible ... it looks like it's much more stable plus has some new features (compression!) that we are very interested in. Again, my 2 cents worth. Kevin On Dec 8, 2015, at 8:14 AM, Sobey, Richard A > wrote: This may not be at all applicable to your situation, but we're creating thousands of snapshots per day of many independent filesets. The same script(s) call mmdelsnapshot, too. We haven't seen any particular issues with this. GPFS 3.5. I note with intereste your bug report below about 4.1.0.x though - are you able to elaborate? From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 07 December 2015 17:53 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting Hi Stewart, We had been running mmcrsnapshot with a ~700 node remote cluster accessing the filesystem for a couple of years now without issue. However, we haven't been running it for a little while because there is a very serious bug in GPFS 4.1.0.x relating to snapshot *deletion*. There is an efix for it and we are in the process of rolling that out, but will not try to resume snapshots until both clusters are fully updated. HTH... Kevin On Dec 7, 2015, at 11:23 AM, Howard, Stewart Jameson > wrote: Hi All, Thanks to Doug and Kevin for the replies. In answer to Kevin's question about our choice of clustering solution for NFS: the choice was made hoping to maintain some simplicity by not using more than one HA solution at a time. However, it seems that this choice might have introduced more wrinkles than it's ironed out. An update on our situation: we have actually uncovered another clue since my last posting. One thing that this now known to be correlated *very* closely with instability in the NFS layer is running `mmcrsnapshot`. We had noticed that flapping happened like clockwork at midnight every night. This happens to be the same time at which our crontab was running the `mmcrsnapshot` so, as an experiment, we moved the snapshot to happen at 1a. After this change, the late-night flapping has moved to 1a and now happens reliably every night at that time. I saw a post on this list from 2013 stating that `mmcrsnapshot` was known to hang up the filesystem with race conditions that result in deadlocks and am wondering if that is still a problem with the `mmcrsnapthost` command. Running the snapshots had not been an obvious problem before, but seems to have become one since we deployed ~300 additional GPFS clients in a remote cluster configuration about a week ago. Can anybody comment on the safety of running `mmcrsnapshot` with a ~300 node remote cluster accessing the filesystem? Also, I would comment that this is not the only condition under which we see instability in the NFS layer. We continue to see intermittent instability through the day. The creation of a snapshot is simply the one well-correlated condition that we've discovered so far. Thanks so much to everyone for your help :) Stewart _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss - Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss - Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From chris.hunter at yale.edu Fri Dec 11 00:11:29 2015 From: chris.hunter at yale.edu (Chris hunter) Date: Thu, 10 Dec 2015 19:11:29 -0500 Subject: [gpfsug-discuss] e: GPFS Remote Cluster Co-existence with, CTDB/NFS Re-exporting Message-ID: <566A14B1.3070107@yale.edu> Hi Stewart, Can't comment on NFS nor snapshot issues. However its common to change filesystem parameters "maxMissedPingTimeout" and "minMissedPingTimeout" when adding remote clusters. https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Tuning%20Parameters Below is an earlier gpfsug thread on about remote cluster expels: > Re: [gpfsug-discuss] data interface and management infercace. > *Bob Oesterlin*oester at > gmail.com > /Mon Jul 13 18:42:47 BST 2015/ > Some thoughts on node expels, based on the last 2-3 months of "expel hell" > here. We've spent a lot of time looking at this issue, across multiple > clusters. A big thanks to IBM for helping us center in on the right issues. > First, you need to understand if the expels are due to "expired lease" > message, or expels due to "communication issues". It sounds like you are > talking about the latter. In the case of nodes being expelled due to > communication issues, it's more likely the problem in related to network > congestion. This can occur at many levels - the node, the network, or the > switch. > > When it's a communication issue, changing prams like "missed ping timeout" > isn't going to help you. The problem for us ended up being that GPFS wasn't > getting a response to a periodic "keep alive" poll to the node, and after > 300 seconds, it declared the node dead and expelled it. You can tell if > this is the issue by starting to look at the RPC waiters just before the > expel. If you see something like "Waiting for poll on sock" RPC, that the > node is waiting for that periodic poll to return, and it's not seeing it. > The response is either lost in the network, sitting on the network queue, > or the node is too busy to send it. You may also see RPC's like "waiting > for exclusive use of connection" RPC - this is another clear indication of > network congestion. > > Look at the GPFSUG presentions (http://www.gpfsug.org/presentations/) for > one by Jason Hick (NERSC) - he also talks about these issues. You need to > take a look at net.ipv4.tcp_wmem and net.ipv4.tcp_rmem, especially if you > have client nodes that are on slower network interfaces. > > In our case, it was a number of factors - adjusting these settings, > looking at congestion at the switch level, and some physical hardware > issues. > > Bob Oesterlin, Sr Storage Engineer, Nuance Communications > robert.oesterlin at nuance.com > chris hunter chris.hunter at yale.edu > -----Original Message----- > Sent: Friday, 11 December 2015 2:14 AM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting > > Hi Again Everybody, > > Ok, so we got resolution on this. Recall that I had said we'd just added ~300 remote cluster GPFS clients and started having problems with CTDB the very same day... > > Among those clients, there were three that had misconfigured firewalls, such that they could reach our home cluster nodes on port 1191, but our home cluster nodes could*not* reach them on 1191*or* on any of the ephemeral ports. This situation played absolute*havoc* with the stability of the filesystem. From what we could tell, it seemed that these three nodes would establish a harmless-looking connection and mount the filesystem. However, as soon as one of them acquired a resource (lock token or similar?) that the home cluster needed back...watch out! > > In the GPFS logs on our side, we would see messages asking for the expulsion of these nodes about 4 - 5 times per day and a ton of messages about timeouts when trying to contact them. These nodes would then re-join the cluster, since they could contact us, and this would entail repeated "delay N seconds for recovery" events. > > During these recovery periods, the filesystem would become unresponsive for up to 60 or more seconds at a time. This seemed to cause various NFS processes to fall on their faces. Sometimes, the victim would be nfsd itself; other times, it would be rpc.mountd. CTDB would then come check on NFS, find that it was floundering, and start a recovery run. To make things worse, at those very times the CTDB shared accounting files would*also* be unavailable since they reside on the same GPFS filesystem that they are serving (thanks to Doug for pointing out the flaw in this design and we're currently looking for an alternate home for these shared files). > > This all added up to a*lot* of flapping, in NFS as well as with CTDB itself. However, the problems with CTDB/NFS were a*symptom* in this case, not a root cause. The*cause* was the imperfect connectivity of just three out of 300 new clients. I think the moral of the story here is this: if you're adding remote cluster clients, make*absolutely* sure that all communications work going both ways between your home cluster and*every* new client. If there is asymmetrical connectivity such as we had last week, you are in for one wild ride. I would also point out that the flapping did not stop until we resolved connectivity for*all* of the clients, so remember that even having one single half-connected client is poisonous to your stability. > > Thanks to everybody for all of your help! Unless something changes, I'm declaring that our site is out of the woods on this one > > Stewart From sjhoward at iu.edu Fri Dec 11 21:20:10 2015 From: sjhoward at iu.edu (Howard, Stewart Jameson) Date: Fri, 11 Dec 2015 21:20:10 +0000 Subject: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting In-Reply-To: References: <1449509014517.19529@iu.edu> <49D9766B-B369-4896-A466-DB60692FE08F@vanderbilt.edu> , <8E00046D-C44B-4A5C-905B-B56019C94097@vanderbilt.edu>, <1449605949971.76189@iu.edu>, <858195fae73441fc9e65085c1d32071f@mbxtoa1.winmail.deshaw.com> <1449764061478.4880@iu.edu>, Message-ID: <1449868810019.2038@iu.edu> That's an interesting idea. In the meantime, I was thinking of writing a script that would survey the connectivity on nodes that we're going to add. ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Greg.Lehmann at csiro.au Sent: Thursday, December 10, 2015 6:05 PM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting Should the process of connecting the clusters automatically test out the connectivity both ways for us? Feature request for a future version? -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Howard, Stewart Jameson Sent: Friday, 11 December 2015 2:14 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting Hi Again Everybody, Ok, so we got resolution on this. Recall that I had said we'd just added ~300 remote cluster GPFS clients and started having problems with CTDB the very same day... Among those clients, there were three that had misconfigured firewalls, such that they could reach our home cluster nodes on port 1191, but our home cluster nodes could *not* reach them on 1191 *or* on any of the ephemeral ports. This situation played absolute *havoc* with the stability of the filesystem. From what we could tell, it seemed that these three nodes would establish a harmless-looking connection and mount the filesystem. However, as soon as one of them acquired a resource (lock token or similar?) that the home cluster needed back...watch out! In the GPFS logs on our side, we would see messages asking for the expulsion of these nodes about 4 - 5 times per day and a ton of messages about timeouts when trying to contact them. These nodes would then re-join the cluster, since they could contact us, and this would entail repeated "delay N seconds for recovery" events. During these recovery periods, the filesystem would become unresponsive for up to 60 or more seconds at a time. This seemed to cause various NFS processes to fall on their faces. Sometimes, the victim would be nfsd itself; other times, it would be rpc.mountd. CTDB would then come check on NFS, find that it was floundering, and start a recovery run. To make things worse, at those very times the CTDB shared accounting files would *also* be unavailable since they reside on the same GPFS filesystem that they are serving (thanks to Doug for pointing out the flaw in this design and we're currently looking for an alternate home for these shared files). This all added up to a *lot* of flapping, in NFS as well as with CTDB itself. However, the problems with CTDB/NFS were a *symptom* in this case, not a root cause. The *cause* was the imperfect connectivity of just three out of 300 new clients. I think the moral of the story here is this: if you're adding remote cluster clients, make *absolutely* sure that all communications work going both ways between your home cluster and *every* new client. If there is asymmetrical connectivity such as we had last week, you are in for one wild ride. I would also point out that the flapping did not stop until we resolved connectivity for *all* of the clients, so remember that even having one single half-connected client is poisonous to your stability. Thanks to everybody for all of your help! Unless something changes, I'm declaring that our site is out of the woods on this one :) Stewart ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Sanchez, Paul Sent: Tuesday, December 8, 2015 5:00 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting One similar incident I've seen is if a filesystem is configured with too low a "-n numNodes" value for the number of nodes actually mounting (or remote mounting) the filesystem, then the cluster may become overloaded, lease renewals may be affected, and node expels may occur. I'm sure we'll all be interested in a recap of what you actually discover here, when the problem is identified. Thx Paul -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Howard, Stewart Jameson Sent: Tuesday, December 08, 2015 3:19 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting Hi All, An update on this. As events have unfolded, we have noticed a new symptom (cause?) that correlates very well, in time, with the instability we've been seeing on our protocol nodes. Specifically, we are seeing three nodes among the remote-cluster clients that were recently deployed that are getting repeatedly expelled from the cluster and then recovered. The expulsion-recovery cycles seem to go in fits and starts. They usually last about 20 to 30 minutes and will involve one, two, or even three of these nodes getting expelled and then rejoining, sometimes as many as ten or twelve times before things calm down. We're not sure if these expulsions are *causing* the troubles that we're having, but the fact that seem to coincide so well seems very suspicious. Also, during one of these events yesterday, I myself saw a `cp` operation wait forever to start during a time period that later, from logs, appeared to be a expulsion-recovery cycle for one of these nodes. Currently, we're investigating: 1) Problems with networking hardware between our home cluster and these remote-cluster nodes. 2) Misconfiguration of those nodes that breaks connectivity somehow. 3) Load or resource depletion on the problem nodes that may cause them to be unresponsive. On the CTDB front, we've increased CTDB's tolerance for unresponsiveness in the filesystem and hope that will at least keep the front end from going crazy when the filesystem becomes unresponsive. Has anybody seen a cluster suffer so badly from membership-thrashing by remote-cluster nodes? Is there a way to "blacklist" nodes that don't play nicely until they can be fixed? Any suggestions of conditions that might cause repeated expulsions? Thanks so much for your help! Stewart ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Simon Thompson (Research Computing - IT Services) Sent: Tuesday, December 8, 2015 9:56 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting 4.2.0 is out. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Buterbaugh, Kevin L [Kevin.Buterbaugh at Vanderbilt.Edu] Sent: 08 December 2015 14:33 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting Hi Richard, We went from GPFS 3.5.0.26 (where we also had zero problems with snapshot deletion) to GPFS 4.1.0.8 this past August and immediately hit the snapshot deletion bug (it's some sort of race condition). It's not pleasant ... to recover we had to unmount the affected filesystem from both clusters, which didn't exactly make our researchers happy. But the good news is that there is an efix available for it if you're on the 4.1.0 series and I am 99% sure that the bug has also been fixed in the last several PTF's for the 4.1.1 series. That's not the only bug we hit when going to 4.1.0.8 so my personal advice / opinion would be to bypass 4.1.0 and go straight to 4.1.1 or 4.2 when it comes out. We are planning on going to 4.2 as soon as feasible ... it looks like it's much more stable plus has some new features (compression!) that we are very interested in. Again, my 2 cents worth. Kevin On Dec 8, 2015, at 8:14 AM, Sobey, Richard A > wrote: This may not be at all applicable to your situation, but we're creating thousands of snapshots per day of many independent filesets. The same script(s) call mmdelsnapshot, too. We haven't seen any particular issues with this. GPFS 3.5. I note with intereste your bug report below about 4.1.0.x though - are you able to elaborate? From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh, Kevin L Sent: 07 December 2015 17:53 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting Hi Stewart, We had been running mmcrsnapshot with a ~700 node remote cluster accessing the filesystem for a couple of years now without issue. However, we haven't been running it for a little while because there is a very serious bug in GPFS 4.1.0.x relating to snapshot *deletion*. There is an efix for it and we are in the process of rolling that out, but will not try to resume snapshots until both clusters are fully updated. HTH... Kevin On Dec 7, 2015, at 11:23 AM, Howard, Stewart Jameson > wrote: Hi All, Thanks to Doug and Kevin for the replies. In answer to Kevin's question about our choice of clustering solution for NFS: the choice was made hoping to maintain some simplicity by not using more than one HA solution at a time. However, it seems that this choice might have introduced more wrinkles than it's ironed out. An update on our situation: we have actually uncovered another clue since my last posting. One thing that this now known to be correlated *very* closely with instability in the NFS layer is running `mmcrsnapshot`. We had noticed that flapping happened like clockwork at midnight every night. This happens to be the same time at which our crontab was running the `mmcrsnapshot` so, as an experiment, we moved the snapshot to happen at 1a. After this change, the late-night flapping has moved to 1a and now happens reliably every night at that time. I saw a post on this list from 2013 stating that `mmcrsnapshot` was known to hang up the filesystem with race conditions that result in deadlocks and am wondering if that is still a problem with the `mmcrsnapthost` command. Running the snapshots had not been an obvious problem before, but seems to have become one since we deployed ~300 additional GPFS clients in a remote cluster configuration about a week ago. Can anybody comment on the safety of running `mmcrsnapshot` with a ~300 node remote cluster accessing the filesystem? Also, I would comment that this is not the only condition under which we see instability in the NFS layer. We continue to see intermittent instability through the day. The creation of a snapshot is simply the one well-correlated condition that we've discovered so far. Thanks so much to everyone for your help :) Stewart _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss - Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss - Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From volobuev at us.ibm.com Fri Dec 11 23:46:03 2015 From: volobuev at us.ibm.com (Yuri L Volobuev) Date: Fri, 11 Dec 2015 15:46:03 -0800 Subject: [gpfsug-discuss] Restriping GPFS Metadata In-Reply-To: References: Message-ID: <201512112346.tBBNkDZa001896@d03av05.boulder.ibm.com> Hi Kevin, The short answer is: no, it's not possible to do a rebalance (mmrestripefs -b) for metadata but not data with current GPFS code. This is something we plan on addressing in a future code update. It doesn't really help to separate data and metadata in different pools. Using -P system results in some metadata being processed, but not all. All of this has to do with the mechanics of GPFS PIT code. If you haven't already, I recommend reading "Long-running GPFS administration commands" [ https://ibm.biz/BdHnX8] doc for background. The layout of storage pools is something that's orthogonal to how PIT scans work. It's easy to rebalance just system metadata (inode file, block and inode allocation maps, a few smaller system files): just ^C mmrestripefs once it gets into Phase 4 (User metadata). Rebalancing user metadata (directories, indirect blocks, EA overflow blocks) requires running mmrestripefs -b to completion, and this indeed can take a while on a large fs. If one tries to speed things up using -P system, then all inodes that don't belong to the system pool will get summarily skipped, including the metadata associated with those inodes. A code change is needed to enable metadata-only rebalancing. yuri -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Mon Dec 14 00:50:00 2015 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 14 Dec 2015 00:50:00 +0000 Subject: [gpfsug-discuss] Multiple nodes hung with 'waiting for the flush flag to commit metadata' Message-ID: Any idea what this hang condition is all about? I have several nodes all in a sort of deadlock, with the following long waiters. I know I?m probably looking at a PMR, but ? any other clues on what be at work? GPFS 4.1.0.7 on Linux, RH 6.6. They all seem to go back to nodes where 'waiting for the flush flag to commit metadata? and 'waiting for WW lock? are the RPCs in question. 0x7F418C0C07D0 ( 18869) waiting 203445.829057195 seconds, InodePrefetchWorkerThread: on ThCond 0x7F41FC02A338 (0x7F41FC02A338) (MsgRecordCondvar), reason 'RPC wait' for tmMsgRevoke on node 10.30.105.68 0x7F418C0C66D0 ( 18876) waiting 196174.410095017 seconds, InodePrefetchWorkerThread: on ThCond 0x7F40AC8AB798 (0x7F40AC8AB798) (MsgRecordCondvar), reason 'RPC wait' for tmMsgRevoke on node 10.30.86.102 0x7F9C5C0041F0 ( 17394) waiting 218020.428801654 seconds, SyncHandlerThread: on ThCond 0x1801970D678 (0xFFFFC9001970D678) (InodeFlushCondVar), reason 'waiting for the flush flag to commit metadata' 0x7FEAC0037F10 ( 25547) waiting 158003.275282910 seconds, InodePrefetchWorkerThread: on ThCond 0x7FEBA400E398 (0x7FEBA400E398) (MsgRecordCondvar), reason 'RPC wait' for tmMsgRevoke on node 10.30.86.159 0x7F04B0028E80 ( 11757) waiting 159426.694691653 seconds, InodePrefetchWorkerThread: on ThCond 0x7F0400002A28 (0x7F0400002A28) (MsgRecordCondvar), reason 'RPC wait' for tmMsgTellAcquire1 on node 10.30.43.226 0x7F04D0013AA0 ( 21781) waiting 157723.199692503 seconds, InodePrefetchWorkerThread: on ThCond 0x7F0454010358 (0x7F0454010358) (MsgRecordCondvar), reason 'RPC wait' for tmMsgTellAcquire1 on node 10.30.43.227 0x7F6F480041F0 ( 12964) waiting 209491.171775225 seconds, SyncHandlerThread: on ThCond 0x18022F3C490 (0xFFFFC90022F3C490) (InodeFlushCondVar), reason 'waiting for the flush flag to commit metadata' 0x7F03180041F0 ( 12338) waiting 212486.480961641 seconds, SyncHandlerThread: on ThCond 0x18027186220 (0xFFFFC90027186220) (LkObjCondvar), reason 'waiting for WW lock' 0x7F1EB00041F0 ( 12598) waiting 215765.483202551 seconds, SyncHandlerThread: on ThCond 0x18026FDFDD0 (0xFFFFC90026FDFDD0) (InodeFlushCondVar), reason 'waiting for the flush flag to commit metadata' 0x7F83540041F0 ( 12605) waiting 75189.385741859 seconds, SyncHandlerThread: on ThCond 0x18021DAA7F8 (0xFFFFC90021DAA7F8) (InodeFlushCondVar), reason 'waiting for the flush flag to commit metadata' 0x7FF10C20DA10 ( 34836) waiting 202382.680544395 seconds, InodePrefetchWorkerThread: on ThCond 0x7FF1640026C8 (0x7FF1640026C8) (MsgRecordCondvar), reason 'RPC wait' for tmMsgRevoke on node 10.30.86.77 0x7F839806DBF0 ( 49131) waiting 158295.556723453 seconds, InodePrefetchWorkerThread: on ThCond 0x7F82B0000FF8 (0x7F82B0000FF8) (MsgRecordCondvar), reason 'RPC wait' for tmMsgTellAcquire1 on node 10.30.43.226 Bob Oesterlin Sr Storage Engineer, Nuance Communications 507-269-0413 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dfischer at de.ibm.com Mon Dec 14 16:28:25 2015 From: dfischer at de.ibm.com (Dietmar Fischer) Date: Mon, 14 Dec 2015 17:28:25 +0100 Subject: [gpfsug-discuss] Plugin Requirement / GUI Message-ID: <201512141628.tBEGSXwD009740@d06av03.portsmouth.uk.ibm.com> We keep hearing that customers, who have seen the new Spectrum Scale GUI, are asking for a "plugin" capability. Now there are many ways to offer a plugin capability and we are wondering what exactly is driving this request and what is required? Is it about using the new GUI and extending it by other potentially user defined panels and data (performance, health, state, events, configuration, ...) and if so, what exactly? Or would you like to use data from the GUI to be used in other tools or dashboards and if so, which information? I am looking forward to getting your feedback and better understanding the requirement(s)! -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at buzzard.me.uk Mon Dec 14 17:04:43 2015 From: jonathan at buzzard.me.uk (Jonathan Buzzard) Date: Mon, 14 Dec 2015 17:04:43 +0000 Subject: [gpfsug-discuss] Plugin Requirement / GUI In-Reply-To: <201512141628.tBEGSXwD009740@d06av03.portsmouth.uk.ibm.com> References: <201512141628.tBEGSXwD009740@d06av03.portsmouth.uk.ibm.com> Message-ID: <1450112683.4059.78.camel@buzzard.phy.strath.ac.uk> On Mon, 2015-12-14 at 17:28 +0100, Dietmar Fischer wrote: > We keep hearing that customers, who have seen the new Spectrum Scale > GUI, are asking for a "plugin" capability. > > Now there are many ways to offer a plugin capability and we are > wondering what exactly is driving this request and what is required? > > Is it about using the new GUI and extending it by other potentially > user defined panels and data (performance, health, state, events, > configuration, ...) and if so, what exactly? > Or would you like to use data from the GUI to be used in other tools > or dashboards and if so, which information? > > I am looking forward to getting your feedback and better understanding > the requirement(s)! The idea revolves I think around the sort of things that you can do in SMIT (and I vote for a port of the running man for nostalgia purposes). The idea would be so that you could say take some input from the user in a form and then run all the commands to say provision a share at your site, so that someone who knows very little about GPFS can create a fileset, link it to the files system, and then create the SMB/NFS shares. Another use would be to run a range of specific "maintenance" task for your site, again run by someone with minimal knowledge of GPFS. Basically look at what you can do in SMIT :-) JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. From chekh at stanford.edu Tue Dec 15 20:34:38 2015 From: chekh at stanford.edu (Alex Chekholko) Date: Tue, 15 Dec 2015 12:34:38 -0800 Subject: [gpfsug-discuss] unusual node expels? Message-ID: <5670795E.2070200@stanford.edu> Hi all, I had a RHEL6.3 / MLNX OFED 1.5.3 / GPFS 3.5.0.10 cluster, which was working fine. We tried to upgrade some stuff (our mistake!), specifically the Mellanox firmwares and the OS and switched to in-built CentOS OFED. So now I have CentOS 6.7 / GPFS 3.5.0.29 cluster where the GPFS client nodes refuse to stay connected. Here is a typical log: [root at cn1 ~]# cat /var/adm/ras/mmfs.log.latest Tue Dec 15 12:21:38 PST 2015: runmmfs starting Removing old /var/adm/ras/mmfs.log.* files: Unloading modules from /lib/modules/2.6.32-573.8.1.el6.x86_64/extra Loading modules from /lib/modules/2.6.32-573.8.1.el6.x86_64/extra Module Size Used by mmfs26 1836054 0 mmfslinux 330095 1 mmfs26 tracedev 43757 2 mmfs26,mmfslinux Tue Dec 15 12:21:39.230 2015: mmfsd initializing. {Version: 3.5.0.29 Built: Nov 6 2015 15:28:46} ... Tue Dec 15 12:21:40.847 2015: VERBS RDMA starting. Tue Dec 15 12:21:40.849 2015: VERBS RDMA library libibverbs.so.1 (version >= 1.1) loaded and initialized. Tue Dec 15 12:21:40.850 2015: VERBS RDMA verbsRdmasPerNode reduced from 128 to 98 to match (nsdMaxWorkerThreads 96 + (nspdThreadsPerQueue 2 * nspdQueues 1)). Tue Dec 15 12:21:41.122 2015: VERBS RDMA device mlx4_0 port 1 fabnum 0 opened, lid 10, 4x FDR INFINIBAND. Tue Dec 15 12:21:41.123 2015: VERBS RDMA started. Tue Dec 15 12:21:41.626 2015: Connecting to 10.210.16.40 hs-gs-01 Tue Dec 15 12:21:41.627 2015: Connected to 10.210.16.40 hs-gs-01 Tue Dec 15 12:21:41.628 2015: Connecting to 10.210.16.41 hs-gs-02 Tue Dec 15 12:21:41.629 2015: Connected to 10.210.16.41 hs-gs-02 Tue Dec 15 12:21:41.630 2015: Node 10.210.16.41 (hs-gs-02) is now the Group Leader. Tue Dec 15 12:21:41.641 2015: mmfsd ready Tue Dec 15 12:21:41 PST 2015: mmcommon mmfsup invoked. Parameters: 10.210.17.1 10.210.16.41 all Tue Dec 15 12:21:41 PST 2015: mounting /dev/hsgs Tue Dec 15 12:21:41.918 2015: Command: mount hsgs Tue Dec 15 12:21:42.131 2015: Connecting to 10.210.16.42 hs-gs-03 Tue Dec 15 12:21:42.132 2015: Connecting to 10.210.16.43 hs-gs-04 Tue Dec 15 12:21:42.133 2015: Connected to 10.210.16.42 hs-gs-03 Tue Dec 15 12:21:42.134 2015: Connected to 10.210.16.43 hs-gs-04 Tue Dec 15 12:21:42.148 2015: VERBS RDMA connecting to 10.210.16.41 (hs-gs-02) on mlx4_0 port 1 fabnum 0 index 0 Tue Dec 15 12:21:42.149 2015: VERBS RDMA connected to 10.210.16.41 (hs-gs-02) on mlx4_0 port 1 fabnum 0 sl 0 index 0 Tue Dec 15 12:21:42.153 2015: VERBS RDMA connecting to 10.210.16.40 (hs-gs-01) on mlx4_0 port 1 fabnum 0 index 1 Tue Dec 15 12:21:42.154 2015: VERBS RDMA connected to 10.210.16.40 (hs-gs-01) on mlx4_0 port 1 fabnum 0 sl 0 index 1 Tue Dec 15 12:21:42.171 2015: Connecting to 10.210.16.11 hs-ln01.local Tue Dec 15 12:21:42.173 2015: Close connection to 10.210.16.11 hs-ln01.local (No route to host) Tue Dec 15 12:21:42.174 2015: Retry connection to 10.210.16.11 hs-ln01.local Tue Dec 15 12:21:42.173 2015: Close connection to 10.210.16.11 hs-ln01.local (No route to host) Tue Dec 15 12:22:55.322 2015: Request sent to 10.210.16.41 (hs-gs-02) to expel 10.210.16.11 (hs-ln01.local) from cluster HS-GS-Cluster.hs-gs-01 Tue Dec 15 12:22:55.323 2015: This node will be expelled from cluster HS-GS-Cluster.hs-gs-01 due to expel msg from 10.210.17.1 (cn1.local) Tue Dec 15 12:22:55.324 2015: This node is being expelled from the cluster. Tue Dec 15 12:22:55.323 2015: Lost membership in cluster HS-GS-Cluster.hs-gs-01. Unmounting file systems. Tue Dec 15 12:22:55.325 2015: VERBS RDMA closed connection to 10.210.16.41 (hs-gs-02) on mlx4_0 port 1 fabnum 0 index 0 Tue Dec 15 12:22:55.327 2015: Cluster Manager connection broke. Probing cluster HS-GS-Cluster.hs-gs-01 Tue Dec 15 12:22:55.328 2015: VERBS RDMA closed connection to 10.210.16.40 (hs-gs-01) on mlx4_0 port 1 fabnum 0 index 1 Tue Dec 15 12:22:56.419 2015: Command: err 2: mount hsgs Tue Dec 15 12:22:56.420 2015: Specified entity, such as a disk or file system, does not exist. mount: No such file or directory Tue Dec 15 12:22:56 PST 2015: finished mounting /dev/hsgs Tue Dec 15 12:22:56.587 2015: Quorum loss. Probing cluster HS-GS-Cluster.hs-gs-01 Tue Dec 15 12:22:57.087 2015: Connecting to 10.210.16.40 hs-gs-01 Tue Dec 15 12:22:57.088 2015: Connected to 10.210.16.40 hs-gs-01 Tue Dec 15 12:22:57.089 2015: Connecting to 10.210.16.41 hs-gs-02 Tue Dec 15 12:22:57.090 2015: Connected to 10.210.16.41 hs-gs-02 Tue Dec 15 12:23:02.090 2015: Connecting to 10.210.16.42 hs-gs-03 Tue Dec 15 12:23:02.092 2015: Connected to 10.210.16.42 hs-gs-03 Tue Dec 15 12:23:49.604 2015: Node 10.210.16.41 (hs-gs-02) is now the Group Leader. Tue Dec 15 12:23:49.614 2015: mmfsd ready Tue Dec 15 12:23:49 PST 2015: mmcommon mmfsup invoked. Parameters: 10.210.17.1 10.210.16.41 all Tue Dec 15 12:23:49 PST 2015: mounting /dev/hsgs Tue Dec 15 12:23:49.866 2015: Command: mount hsgs Tue Dec 15 12:23:49.949 2015: Connecting to 10.210.16.43 hs-gs-04 Tue Dec 15 12:23:49.950 2015: Connected to 10.210.16.43 hs-gs-04 Tue Dec 15 12:23:49.957 2015: VERBS RDMA connecting to 10.210.16.41 (hs-gs-02) on mlx4_0 port 1 fabnum 0 index 1 Tue Dec 15 12:23:49.958 2015: VERBS RDMA connected to 10.210.16.41 (hs-gs-02) on mlx4_0 port 1 fabnum 0 sl 0 index 1 Tue Dec 15 12:23:49.962 2015: VERBS RDMA connecting to 10.210.16.40 (hs-gs-01) on mlx4_0 port 1 fabnum 0 index 0 Tue Dec 15 12:23:49.963 2015: VERBS RDMA connected to 10.210.16.40 (hs-gs-01) on mlx4_0 port 1 fabnum 0 sl 0 index 0 Tue Dec 15 12:23:49.980 2015: Close connection to 10.210.16.11 hs-ln01.local (No route to host) Tue Dec 15 12:23:49.981 2015: Retry connection to 10.210.16.11 hs-ln01.local Tue Dec 15 12:23:49.980 2015: Close connection to 10.210.16.11 hs-ln01.local (No route to host) Tue Dec 15 12:25:05.321 2015: Request sent to 10.210.16.41 (hs-gs-02) to expel 10.210.16.11 (hs-ln01.local) from cluster HS-GS-Cluster.hs-gs-01 Tue Dec 15 12:25:05.322 2015: This node will be expelled from cluster HS-GS-Cluster.hs-gs-01 due to expel msg from 10.210.17.1 (cn1.local) Tue Dec 15 12:25:05.323 2015: This node is being expelled from the cluster. Tue Dec 15 12:25:05.324 2015: Lost membership in cluster HS-GS-Cluster.hs-gs-01. Unmounting file systems. Tue Dec 15 12:25:05.325 2015: VERBS RDMA closed connection to 10.210.16.41 (hs-gs-02) on mlx4_0 port 1 fabnum 0 index 1 Tue Dec 15 12:25:05.326 2015: VERBS RDMA closed connection to 10.210.16.40 (hs-gs-01) on mlx4_0 port 1 fabnum 0 index 0 Tue Dec 15 12:25:05.327 2015: Cluster Manager connection broke. Probing cluster HS-GS-Cluster.hs-gs-01 Tue Dec 15 12:25:06.413 2015: Command: err 2: mount hsgs Tue Dec 15 12:25:06.414 2015: Specified entity, such as a disk or file system, does not exist. mount: No such file or directory Tue Dec 15 12:25:06 PST 2015: finished mounting /dev/hsgs Tue Dec 15 12:25:06.569 2015: Quorum loss. Probing cluster HS-GS-Cluster.hs-gs-01 Tue Dec 15 12:25:07.069 2015: Connecting to 10.210.16.40 hs-gs-01 Tue Dec 15 12:25:07.070 2015: Connected to 10.210.16.40 hs-gs-01 Tue Dec 15 12:25:07.071 2015: Connecting to 10.210.16.41 hs-gs-02 Tue Dec 15 12:25:07.072 2015: Connected to 10.210.16.41 hs-gs-02 Tue Dec 15 12:25:12.072 2015: Connecting to 10.210.16.42 hs-gs-03 Tue Dec 15 12:25:12.073 2015: Connected to 10.210.16.42 hs-gs-03 Tue Dec 15 12:25:59.585 2015: Node 10.210.16.41 (hs-gs-02) is now the Group Leader. Tue Dec 15 12:25:59.596 2015: mmfsd ready Tue Dec 15 12:25:59 PST 2015: mmcommon mmfsup invoked. Parameters: 10.210.17.1 10.210.16.41 all Tue Dec 15 12:25:59 PST 2015: mounting /dev/hsgs Tue Dec 15 12:25:59.856 2015: Command: mount hsgs Tue Dec 15 12:25:59.934 2015: Connecting to 10.210.16.43 hs-gs-04 Tue Dec 15 12:25:59.935 2015: Connected to 10.210.16.43 hs-gs-04 Tue Dec 15 12:25:59.941 2015: VERBS RDMA connecting to 10.210.16.41 (hs-gs-02) on mlx4_0 port 1 fabnum 0 index 0 Tue Dec 15 12:25:59.942 2015: VERBS RDMA connected to 10.210.16.41 (hs-gs-02) on mlx4_0 port 1 fabnum 0 sl 0 index 0 Tue Dec 15 12:25:59.945 2015: VERBS RDMA connecting to 10.210.16.40 (hs-gs-01) on mlx4_0 port 1 fabnum 0 index 1 Tue Dec 15 12:25:59.947 2015: VERBS RDMA connected to 10.210.16.40 (hs-gs-01) on mlx4_0 port 1 fabnum 0 sl 0 index 1 Tue Dec 15 12:25:59.963 2015: Close connection to 10.210.16.11 hs-ln01.local (No route to host) Tue Dec 15 12:25:59.964 2015: Retry connection to 10.210.16.11 hs-ln01.local Tue Dec 15 12:25:59.965 2015: Close connection to 10.210.16.11 hs-ln01.local (No route to host) Tue Dec 15 12:27:15.457 2015: Request sent to 10.210.16.41 (hs-gs-02) to expel 10.210.16.11 (hs-ln01.local) from cluster HS-GS-Cluster.hs-gs-01 Tue Dec 15 12:27:15.458 2015: This node will be expelled from cluster HS-GS-Cluster.hs-gs-01 due to expel msg from 10.210.17.1 (cn1.local) Tue Dec 15 12:27:15.459 2015: This node is being expelled from the cluster. Tue Dec 15 12:27:15.460 2015: Lost membership in cluster HS-GS-Cluster.hs-gs-01. Unmounting file systems. Tue Dec 15 12:27:15.461 2015: VERBS RDMA closed connection to 10.210.16.41 (hs-gs-02) on mlx4_0 port 1 fabnum 0 index 0 Tue Dec 15 12:27:15.462 2015: Cluster Manager connection broke. Probing cluster HS-GS-Cluster.hs-gs-01 Tue Dec 15 12:27:15.463 2015: VERBS RDMA closed connection to 10.210.16.40 (hs-gs-01) on mlx4_0 port 1 fabnum 0 index 1 Tue Dec 15 12:27:16.578 2015: Command: err 2: mount hsgs Tue Dec 15 12:27:16.579 2015: Specified entity, such as a disk or file system, does not exist. mount: No such file or directory Tue Dec 15 12:27:16 PST 2015: finished mounting /dev/hsgs Tue Dec 15 12:27:16.938 2015: Quorum loss. Probing cluster HS-GS-Cluster.hs-gs-01 Tue Dec 15 12:27:17.439 2015: Connecting to 10.210.16.40 hs-gs-01 Tue Dec 15 12:27:17.440 2015: Connected to 10.210.16.40 hs-gs-01 Tue Dec 15 12:27:17.441 2015: Connecting to 10.210.16.41 hs-gs-02 Tue Dec 15 12:27:17.442 2015: Connected to 10.210.16.41 hs-gs-02 Tue Dec 15 12:27:22.442 2015: Connecting to 10.210.16.42 hs-gs-03 Tue Dec 15 12:27:22.443 2015: Connected to 10.210.16.42 hs-gs-03 Tue Dec 15 12:28:09.955 2015: Node 10.210.16.41 (hs-gs-02) is now the Group Leader. Tue Dec 15 12:28:09.965 2015: mmfsd ready Tue Dec 15 12:28:10 PST 2015: mmcommon mmfsup invoked. Parameters: 10.210.17.1 10.210.16.41 all Tue Dec 15 12:28:10 PST 2015: mounting /dev/hsgs Tue Dec 15 12:28:10.222 2015: Command: mount hsgs Tue Dec 15 12:28:10.314 2015: Connecting to 10.210.16.43 hs-gs-04 Tue Dec 15 12:28:10.315 2015: Connected to 10.210.16.43 hs-gs-04 Tue Dec 15 12:28:10.322 2015: VERBS RDMA connecting to 10.210.16.41 (hs-gs-02) on mlx4_0 port 1 fabnum 0 index 1 Tue Dec 15 12:28:10.323 2015: VERBS RDMA connected to 10.210.16.41 (hs-gs-02) on mlx4_0 port 1 fabnum 0 sl 0 index 1 Tue Dec 15 12:28:10.326 2015: VERBS RDMA connecting to 10.210.16.40 (hs-gs-01) on mlx4_0 port 1 fabnum 0 index 0 Tue Dec 15 12:28:10.328 2015: VERBS RDMA connected to 10.210.16.40 (hs-gs-01) on mlx4_0 port 1 fabnum 0 sl 0 index 0 Tue Dec 15 12:28:10.344 2015: Close connection to 10.210.16.11 hs-ln01.local (No route to host) Tue Dec 15 12:28:10.345 2015: Retry connection to 10.210.16.11 hs-ln01.local Tue Dec 15 12:28:10.346 2015: Close connection to 10.210.16.11 hs-ln01.local (No route to host) All the IB / RDMA stuff looks OK to me, but as soon as the GPFS clients connect, they try to expel each other. The 4 NSD servers seem just fine though. Trying the Mellanox OFED 3.x yields the same results, so somehow I think it's not an IB issue. [root at cn1 ~]# uname -r 2.6.32-573.8.1.el6.x86_64 [root at cn1 ~]# rpm -qa|grep gpfs gpfs.gpl-3.5.0-29.noarch gpfs.docs-3.5.0-29.noarch gpfs.msg.en_US-3.5.0-29.noarch gpfs.base-3.5.0-29.x86_64 Does anyone have any suggestions? Regards, -- chekh at stanford.edu 347-401-4860 chekh at stanford.edu From S.J.Thompson at bham.ac.uk Tue Dec 15 22:50:20 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Tue, 15 Dec 2015 22:50:20 +0000 Subject: [gpfsug-discuss] 4.2 & protocols (missing dependency?) Message-ID: Hi, I;ve just upgraded some of my protocol nodes to 4.2, I noticed on startup that in the logs I get: Traceback (most recent call last): File "/usr/lpp/mmfs/bin/mmcesmon.py", line 178, in import mmcesmon.CommandHandler File "/usr/lpp/mmfs/lib/mmcesmon/CommandHandler.py", line 29, in from FILEService import FILEService File "/usr/lpp/mmfs/lib/mmcesmon/FILEService.py", line 19, in from ExtAuthMonitor import ActiveDirectoryServiceMonitor File "/usr/lpp/mmfs/lib/mmcesmon/ExtAuthMonitor.py", line 15, in import ldap ImportError: No module named ldap Tue 15 Dec 22:39:12 GMT 2015: mmcesmonitor: Monitor has started pid=18963 Traceback (most recent call last): File "/usr/lpp/mmfs/bin/mmcesmon.py", line 178, in import mmcesmon.CommandHandler File "/usr/lpp/mmfs/lib/mmcesmon/CommandHandler.py", line 29, in from FILEService import FILEService File "/usr/lpp/mmfs/lib/mmcesmon/FILEService.py", line 19, in from ExtAuthMonitor import ActiveDirectoryServiceMonitor File "/usr/lpp/mmfs/lib/mmcesmon/ExtAuthMonitor.py", line 15, in import ldap ImportError: No module named ldap Error: Cannot connect to server(localhost), port(/var/mmfs/ces/mmcesmonitor.socket): No such file or directory It looks like one EL7, you also need python-ldap installed (perhaps the installer does this, but it should really be a dependency of the RPM if its required?). Anyway, if you see issues, add the python-ldap RPM and it should fix it. Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From chekh at stanford.edu Tue Dec 15 23:11:39 2015 From: chekh at stanford.edu (Alex Chekholko) Date: Tue, 15 Dec 2015 15:11:39 -0800 Subject: [gpfsug-discuss] unusual node expels? In-Reply-To: <5670795E.2070200@stanford.edu> References: <5670795E.2070200@stanford.edu> Message-ID: <56709E2B.3050503@stanford.edu> Hi, In the end the error message "no route to host" was the correct one, to be taken at face value. Some iptables rules got accidentally set up on some private network interfaces and so a GPFS node that was already up was not accessible from the GPFS nodes that were coming up next, so they would all be expelled. Regards, Alex On 12/15/2015 12:34 PM, Alex Chekholko wrote: > Hi all, > > I had a RHEL6.3 / MLNX OFED 1.5.3 / GPFS 3.5.0.10 cluster, which was > working fine. > > We tried to upgrade some stuff (our mistake!), specifically the Mellanox > firmwares and the OS and switched to in-built CentOS OFED. > > So now I have > CentOS 6.7 / GPFS 3.5.0.29 cluster where the GPFS client nodes refuse to > stay connected. Here is a typical log: > > > [root at cn1 ~]# cat /var/adm/ras/mmfs.log.latest > Tue Dec 15 12:21:38 PST 2015: runmmfs starting > Removing old /var/adm/ras/mmfs.log.* files: > Unloading modules from /lib/modules/2.6.32-573.8.1.el6.x86_64/extra > Loading modules from /lib/modules/2.6.32-573.8.1.el6.x86_64/extra > Module Size Used by > mmfs26 1836054 0 > mmfslinux 330095 1 mmfs26 > tracedev 43757 2 mmfs26,mmfslinux > Tue Dec 15 12:21:39.230 2015: mmfsd initializing. {Version: 3.5.0.29 > Built: Nov 6 2015 15:28:46} ... > Tue Dec 15 12:21:40.847 2015: VERBS RDMA starting. > Tue Dec 15 12:21:40.849 2015: VERBS RDMA library libibverbs.so.1 > (version >= 1.1) loaded and initialized. > Tue Dec 15 12:21:40.850 2015: VERBS RDMA verbsRdmasPerNode reduced from > 128 to 98 to match (nsdMaxWorkerThreads 96 + (nspdThreadsPerQueue 2 * > nspdQueues 1)). > Tue Dec 15 12:21:41.122 2015: VERBS RDMA device mlx4_0 port 1 fabnum 0 > opened, lid 10, 4x FDR INFINIBAND. > Tue Dec 15 12:21:41.123 2015: VERBS RDMA started. > Tue Dec 15 12:21:41.626 2015: Connecting to 10.210.16.40 hs-gs-01 > Tue Dec 15 12:21:41.627 2015: Connected to 10.210.16.40 hs-gs-01 > Tue Dec 15 12:21:41.628 2015: Connecting to 10.210.16.41 hs-gs-02 > Tue Dec 15 12:21:41.629 2015: Connected to 10.210.16.41 hs-gs-02 > Tue Dec 15 12:21:41.630 2015: Node 10.210.16.41 (hs-gs-02) is now the > Group Leader. > Tue Dec 15 12:21:41.641 2015: mmfsd ready > Tue Dec 15 12:21:41 PST 2015: mmcommon mmfsup invoked. Parameters: > 10.210.17.1 10.210.16.41 all > Tue Dec 15 12:21:41 PST 2015: mounting /dev/hsgs > Tue Dec 15 12:21:41.918 2015: Command: mount hsgs > Tue Dec 15 12:21:42.131 2015: Connecting to 10.210.16.42 hs-gs-03 > Tue Dec 15 12:21:42.132 2015: Connecting to 10.210.16.43 hs-gs-04 > Tue Dec 15 12:21:42.133 2015: Connected to 10.210.16.42 hs-gs-03 > Tue Dec 15 12:21:42.134 2015: Connected to 10.210.16.43 hs-gs-04 > Tue Dec 15 12:21:42.148 2015: VERBS RDMA connecting to 10.210.16.41 > (hs-gs-02) on mlx4_0 port 1 fabnum 0 index 0 > Tue Dec 15 12:21:42.149 2015: VERBS RDMA connected to 10.210.16.41 > (hs-gs-02) on mlx4_0 port 1 fabnum 0 sl 0 index 0 > Tue Dec 15 12:21:42.153 2015: VERBS RDMA connecting to 10.210.16.40 > (hs-gs-01) on mlx4_0 port 1 fabnum 0 index 1 > Tue Dec 15 12:21:42.154 2015: VERBS RDMA connected to 10.210.16.40 > (hs-gs-01) on mlx4_0 port 1 fabnum 0 sl 0 index 1 > Tue Dec 15 12:21:42.171 2015: Connecting to 10.210.16.11 hs-ln01.local > > Tue Dec 15 12:21:42.173 2015: Close connection to 10.210.16.11 > hs-ln01.local (No route to host) > Tue Dec 15 12:21:42.174 2015: Retry connection to 10.210.16.11 > hs-ln01.local > Tue Dec 15 12:21:42.173 2015: Close connection to 10.210.16.11 > hs-ln01.local (No route to host) > Tue Dec 15 12:22:55.322 2015: Request sent to 10.210.16.41 (hs-gs-02) to > expel 10.210.16.11 (hs-ln01.local) from cluster HS-GS-Cluster.hs-gs-01 > Tue Dec 15 12:22:55.323 2015: This node will be expelled from cluster > HS-GS-Cluster.hs-gs-01 due to expel msg from 10.210.17.1 (cn1.local) > Tue Dec 15 12:22:55.324 2015: This node is being expelled from the cluster. > Tue Dec 15 12:22:55.323 2015: Lost membership in cluster > HS-GS-Cluster.hs-gs-01. Unmounting file systems. > Tue Dec 15 12:22:55.325 2015: VERBS RDMA closed connection to > 10.210.16.41 (hs-gs-02) on mlx4_0 port 1 fabnum 0 index 0 > Tue Dec 15 12:22:55.327 2015: Cluster Manager connection broke. Probing > cluster HS-GS-Cluster.hs-gs-01 > Tue Dec 15 12:22:55.328 2015: VERBS RDMA closed connection to > 10.210.16.40 (hs-gs-01) on mlx4_0 port 1 fabnum 0 index 1 > Tue Dec 15 12:22:56.419 2015: Command: err 2: mount hsgs > Tue Dec 15 12:22:56.420 2015: Specified entity, such as a disk or file > system, does not exist. > mount: No such file or directory > Tue Dec 15 12:22:56 PST 2015: finished mounting /dev/hsgs > Tue Dec 15 12:22:56.587 2015: Quorum loss. Probing cluster > HS-GS-Cluster.hs-gs-01 > Tue Dec 15 12:22:57.087 2015: Connecting to 10.210.16.40 hs-gs-01 > Tue Dec 15 12:22:57.088 2015: Connected to 10.210.16.40 hs-gs-01 > Tue Dec 15 12:22:57.089 2015: Connecting to 10.210.16.41 hs-gs-02 > Tue Dec 15 12:22:57.090 2015: Connected to 10.210.16.41 hs-gs-02 > Tue Dec 15 12:23:02.090 2015: Connecting to 10.210.16.42 hs-gs-03 > Tue Dec 15 12:23:02.092 2015: Connected to 10.210.16.42 hs-gs-03 > Tue Dec 15 12:23:49.604 2015: Node 10.210.16.41 (hs-gs-02) is now the > Group Leader. > Tue Dec 15 12:23:49.614 2015: mmfsd ready > Tue Dec 15 12:23:49 PST 2015: mmcommon mmfsup invoked. Parameters: > 10.210.17.1 10.210.16.41 all > Tue Dec 15 12:23:49 PST 2015: mounting /dev/hsgs > Tue Dec 15 12:23:49.866 2015: Command: mount hsgs > Tue Dec 15 12:23:49.949 2015: Connecting to 10.210.16.43 hs-gs-04 > Tue Dec 15 12:23:49.950 2015: Connected to 10.210.16.43 hs-gs-04 > Tue Dec 15 12:23:49.957 2015: VERBS RDMA connecting to 10.210.16.41 > (hs-gs-02) on mlx4_0 port 1 fabnum 0 index 1 > Tue Dec 15 12:23:49.958 2015: VERBS RDMA connected to 10.210.16.41 > (hs-gs-02) on mlx4_0 port 1 fabnum 0 sl 0 index 1 > Tue Dec 15 12:23:49.962 2015: VERBS RDMA connecting to 10.210.16.40 > (hs-gs-01) on mlx4_0 port 1 fabnum 0 index 0 > Tue Dec 15 12:23:49.963 2015: VERBS RDMA connected to 10.210.16.40 > (hs-gs-01) on mlx4_0 port 1 fabnum 0 sl 0 index 0 > Tue Dec 15 12:23:49.980 2015: Close connection to 10.210.16.11 > hs-ln01.local (No route to host) > Tue Dec 15 12:23:49.981 2015: Retry connection to 10.210.16.11 > hs-ln01.local > Tue Dec 15 12:23:49.980 2015: Close connection to 10.210.16.11 > hs-ln01.local (No route to host) > Tue Dec 15 12:25:05.321 2015: Request sent to 10.210.16.41 (hs-gs-02) to > expel 10.210.16.11 (hs-ln01.local) from cluster HS-GS-Cluster.hs-gs-01 > Tue Dec 15 12:25:05.322 2015: This node will be expelled from cluster > HS-GS-Cluster.hs-gs-01 due to expel msg from 10.210.17.1 (cn1.local) > Tue Dec 15 12:25:05.323 2015: This node is being expelled from the cluster. > Tue Dec 15 12:25:05.324 2015: Lost membership in cluster > HS-GS-Cluster.hs-gs-01. Unmounting file systems. > Tue Dec 15 12:25:05.325 2015: VERBS RDMA closed connection to > 10.210.16.41 (hs-gs-02) on mlx4_0 port 1 fabnum 0 index 1 > Tue Dec 15 12:25:05.326 2015: VERBS RDMA closed connection to > 10.210.16.40 (hs-gs-01) on mlx4_0 port 1 fabnum 0 index 0 > Tue Dec 15 12:25:05.327 2015: Cluster Manager connection broke. Probing > cluster HS-GS-Cluster.hs-gs-01 > Tue Dec 15 12:25:06.413 2015: Command: err 2: mount hsgs > Tue Dec 15 12:25:06.414 2015: Specified entity, such as a disk or file > system, does not exist. > mount: No such file or directory > Tue Dec 15 12:25:06 PST 2015: finished mounting /dev/hsgs > Tue Dec 15 12:25:06.569 2015: Quorum loss. Probing cluster > HS-GS-Cluster.hs-gs-01 > Tue Dec 15 12:25:07.069 2015: Connecting to 10.210.16.40 hs-gs-01 > Tue Dec 15 12:25:07.070 2015: Connected to 10.210.16.40 hs-gs-01 > Tue Dec 15 12:25:07.071 2015: Connecting to 10.210.16.41 hs-gs-02 > Tue Dec 15 12:25:07.072 2015: Connected to 10.210.16.41 hs-gs-02 > Tue Dec 15 12:25:12.072 2015: Connecting to 10.210.16.42 hs-gs-03 > Tue Dec 15 12:25:12.073 2015: Connected to 10.210.16.42 hs-gs-03 > Tue Dec 15 12:25:59.585 2015: Node 10.210.16.41 (hs-gs-02) is now the > Group Leader. > Tue Dec 15 12:25:59.596 2015: mmfsd ready > Tue Dec 15 12:25:59 PST 2015: mmcommon mmfsup invoked. Parameters: > 10.210.17.1 10.210.16.41 all > Tue Dec 15 12:25:59 PST 2015: mounting /dev/hsgs > Tue Dec 15 12:25:59.856 2015: Command: mount hsgs > Tue Dec 15 12:25:59.934 2015: Connecting to 10.210.16.43 hs-gs-04 > Tue Dec 15 12:25:59.935 2015: Connected to 10.210.16.43 hs-gs-04 > Tue Dec 15 12:25:59.941 2015: VERBS RDMA connecting to 10.210.16.41 > (hs-gs-02) on mlx4_0 port 1 fabnum 0 index 0 > Tue Dec 15 12:25:59.942 2015: VERBS RDMA connected to 10.210.16.41 > (hs-gs-02) on mlx4_0 port 1 fabnum 0 sl 0 index 0 > Tue Dec 15 12:25:59.945 2015: VERBS RDMA connecting to 10.210.16.40 > (hs-gs-01) on mlx4_0 port 1 fabnum 0 index 1 > Tue Dec 15 12:25:59.947 2015: VERBS RDMA connected to 10.210.16.40 > (hs-gs-01) on mlx4_0 port 1 fabnum 0 sl 0 index 1 > Tue Dec 15 12:25:59.963 2015: Close connection to 10.210.16.11 > hs-ln01.local (No route to host) > Tue Dec 15 12:25:59.964 2015: Retry connection to 10.210.16.11 > hs-ln01.local > Tue Dec 15 12:25:59.965 2015: Close connection to 10.210.16.11 > hs-ln01.local (No route to host) > Tue Dec 15 12:27:15.457 2015: Request sent to 10.210.16.41 (hs-gs-02) to > expel 10.210.16.11 (hs-ln01.local) from cluster HS-GS-Cluster.hs-gs-01 > Tue Dec 15 12:27:15.458 2015: This node will be expelled from cluster > HS-GS-Cluster.hs-gs-01 due to expel msg from 10.210.17.1 (cn1.local) > Tue Dec 15 12:27:15.459 2015: This node is being expelled from the cluster. > Tue Dec 15 12:27:15.460 2015: Lost membership in cluster > HS-GS-Cluster.hs-gs-01. Unmounting file systems. > Tue Dec 15 12:27:15.461 2015: VERBS RDMA closed connection to > 10.210.16.41 (hs-gs-02) on mlx4_0 port 1 fabnum 0 index 0 > Tue Dec 15 12:27:15.462 2015: Cluster Manager connection broke. Probing > cluster HS-GS-Cluster.hs-gs-01 > Tue Dec 15 12:27:15.463 2015: VERBS RDMA closed connection to > 10.210.16.40 (hs-gs-01) on mlx4_0 port 1 fabnum 0 index 1 > Tue Dec 15 12:27:16.578 2015: Command: err 2: mount hsgs > Tue Dec 15 12:27:16.579 2015: Specified entity, such as a disk or file > system, does not exist. > mount: No such file or directory > Tue Dec 15 12:27:16 PST 2015: finished mounting /dev/hsgs > Tue Dec 15 12:27:16.938 2015: Quorum loss. Probing cluster > HS-GS-Cluster.hs-gs-01 > Tue Dec 15 12:27:17.439 2015: Connecting to 10.210.16.40 hs-gs-01 > Tue Dec 15 12:27:17.440 2015: Connected to 10.210.16.40 hs-gs-01 > Tue Dec 15 12:27:17.441 2015: Connecting to 10.210.16.41 hs-gs-02 > Tue Dec 15 12:27:17.442 2015: Connected to 10.210.16.41 hs-gs-02 > Tue Dec 15 12:27:22.442 2015: Connecting to 10.210.16.42 hs-gs-03 > Tue Dec 15 12:27:22.443 2015: Connected to 10.210.16.42 hs-gs-03 > Tue Dec 15 12:28:09.955 2015: Node 10.210.16.41 (hs-gs-02) is now the > Group Leader. > Tue Dec 15 12:28:09.965 2015: mmfsd ready > Tue Dec 15 12:28:10 PST 2015: mmcommon mmfsup invoked. Parameters: > 10.210.17.1 10.210.16.41 all > Tue Dec 15 12:28:10 PST 2015: mounting /dev/hsgs > Tue Dec 15 12:28:10.222 2015: Command: mount hsgs > Tue Dec 15 12:28:10.314 2015: Connecting to 10.210.16.43 hs-gs-04 > Tue Dec 15 12:28:10.315 2015: Connected to 10.210.16.43 hs-gs-04 > Tue Dec 15 12:28:10.322 2015: VERBS RDMA connecting to 10.210.16.41 > (hs-gs-02) on mlx4_0 port 1 fabnum 0 index 1 > Tue Dec 15 12:28:10.323 2015: VERBS RDMA connected to 10.210.16.41 > (hs-gs-02) on mlx4_0 port 1 fabnum 0 sl 0 index 1 > Tue Dec 15 12:28:10.326 2015: VERBS RDMA connecting to 10.210.16.40 > (hs-gs-01) on mlx4_0 port 1 fabnum 0 index 0 > Tue Dec 15 12:28:10.328 2015: VERBS RDMA connected to 10.210.16.40 > (hs-gs-01) on mlx4_0 port 1 fabnum 0 sl 0 index 0 > Tue Dec 15 12:28:10.344 2015: Close connection to 10.210.16.11 > hs-ln01.local (No route to host) > Tue Dec 15 12:28:10.345 2015: Retry connection to 10.210.16.11 > hs-ln01.local > Tue Dec 15 12:28:10.346 2015: Close connection to 10.210.16.11 > hs-ln01.local (No route to host) > > > > All the IB / RDMA stuff looks OK to me, but as soon as the GPFS clients > connect, they try to expel each other. The 4 NSD servers seem just fine > though. Trying the Mellanox OFED 3.x yields the same results, so > somehow I think it's not an IB issue. > > [root at cn1 ~]# uname -r > 2.6.32-573.8.1.el6.x86_64 > [root at cn1 ~]# rpm -qa|grep gpfs > gpfs.gpl-3.5.0-29.noarch > gpfs.docs-3.5.0-29.noarch > gpfs.msg.en_US-3.5.0-29.noarch > gpfs.base-3.5.0-29.x86_64 > > Does anyone have any suggestions? > > Regards, -- Alex Chekholko chekh at stanford.edu 347-401-4860 From MDIETZ at de.ibm.com Wed Dec 16 12:02:27 2015 From: MDIETZ at de.ibm.com (Mathias Dietz) Date: Wed, 16 Dec 2015 13:02:27 +0100 Subject: [gpfsug-discuss] 4.2 & protocols (missing dependency?) In-Reply-To: References: Message-ID: <201512161202.tBGC2eda005218@d06av03.portsmouth.uk.ibm.com> Hi, you are right that python-ldap is a required dependency for 4.2 protocol nodes. Please make sure to have the gpfs.protocols-support-4.2.0-0.noarch RPM installed on protocol nodes because this package will enforce the dependencies. >> rpm -qi gpfs.protocols-support-4.2.0-0.noarch Name : gpfs.protocols-support Version : 4.2.0 Release : 0 Architecture: noarch Install Date: Wed 16 Dec 2015 07:56:42 PM CET Group : System Environment/Base Size : 0 License : (C) COPYRIGHT International Business Machines Corp. 2015 Signature : (none) Source RPM : gpfs.protocols-support-4.2.0-0.src.rpm Build Date : Sat 14 Nov 2015 12:20:07 AM CET Build Host : bldlnx84.pok.stglabs.ibm.com Relocations : (not relocatable) Summary : gpfs protocol dependencies Description : This package includes the dependency list for all the protocols to enforce that all relevant Spectrum Scale protocol packages are installed. If this package is not installed "mmchnode" will fail with an appropriate message. [root at p8-10-rhel-71be-01 ~]# rpm -qi gpfs.protocols-support-4.2.0-0.noarch --requires Name : gpfs.protocols-support Version : 4.2.0 Release : 0 Architecture: noarch Install Date: Wed 16 Dec 2015 07:56:42 PM CET Group : System Environment/Base Size : 0 License : (C) COPYRIGHT International Business Machines Corp. 2015 Signature : (none) Source RPM : gpfs.protocols-support-4.2.0-0.src.rpm Build Date : Sat 14 Nov 2015 12:20:07 AM CET Build Host : bldlnx84.pok.stglabs.ibm.com Relocations : (not relocatable) Summary : gpfs protocol dependencies Description : This package includes the dependency list for all the protocols to enforce that all relevant Spectrum Scale protocol packages are installed. If this package is not installed "mmchnode" will fail with an appropriate message. gpfs.base >= 4.2.0 nfs-ganesha >= 2.2 gpfs.smb >= 4.2.0_gpfs spectrum-scale-object >= 4.2.0 python-ldap rpmlib(PayloadFilesHavePrefix) <= 4.0-1 rpmlib(CompressedFileNames) <= 3.0.4-1 Mit freundlichen Gr??en / Kind regards Mathias Dietz Spectrum Scale Development System Health Team - Scrum Master IBM Certified Software Engineer ---------------------------------------------------------------------------------------------------------- IBM Deutschland Hechtsheimer Str. 2 55131 Mainz Phone: +49-6131-84-2027 Mobile: +49-15152801035 E-Mail: mdietz at de.ibm.com ---------------------------------------------------------------------------------------------------------- IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martina Koederitz, Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "Simon Thompson (Research Computing - IT Services)" To: "gpfsug-discuss at spectrumscale.org" Date: 12/15/2015 11:50 PM Subject: [gpfsug-discuss] 4.2 & protocols (missing dependency?) Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, I;ve just upgraded some of my protocol nodes to 4.2, I noticed on startup that in the logs I get: Traceback (most recent call last): File "/usr/lpp/mmfs/bin/mmcesmon.py", line 178, in import mmcesmon.CommandHandler File "/usr/lpp/mmfs/lib/mmcesmon/CommandHandler.py", line 29, in from FILEService import FILEService File "/usr/lpp/mmfs/lib/mmcesmon/FILEService.py", line 19, in from ExtAuthMonitor import ActiveDirectoryServiceMonitor File "/usr/lpp/mmfs/lib/mmcesmon/ExtAuthMonitor.py", line 15, in import ldap ImportError: No module named ldap Tue 15 Dec 22:39:12 GMT 2015: mmcesmonitor: Monitor has started pid=18963 Traceback (most recent call last): File "/usr/lpp/mmfs/bin/mmcesmon.py", line 178, in import mmcesmon.CommandHandler File "/usr/lpp/mmfs/lib/mmcesmon/CommandHandler.py", line 29, in from FILEService import FILEService File "/usr/lpp/mmfs/lib/mmcesmon/FILEService.py", line 19, in from ExtAuthMonitor import ActiveDirectoryServiceMonitor File "/usr/lpp/mmfs/lib/mmcesmon/ExtAuthMonitor.py", line 15, in import ldap ImportError: No module named ldap Error: Cannot connect to server(localhost), port(/var/mmfs/ces/mmcesmonitor.socket): No such file or directory It looks like one EL7, you also need python-ldap installed (perhaps the installer does this, but it should really be a dependency of the RPM if its required?). Anyway, if you see issues, add the python-ldap RPM and it should fix it. Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Dec 16 12:15:49 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Wed, 16 Dec 2015 12:15:49 +0000 Subject: [gpfsug-discuss] 4.2 & protocols (missing dependency?) In-Reply-To: <201512161202.tBGC2eda005218@d06av03.portsmouth.uk.ibm.com> References: <201512161202.tBGC2eda005218@d06av03.portsmouth.uk.ibm.com> Message-ID: OK, Iooked at that. This means pulling in all the object and NFS stuff as well onto my server as well. I only run SMB, so I don;'t want lots of other stuff installing as well .. --> Running transaction check ---> Package gpfs.protocols-support.noarch 0:4.2.0-0 will be installed --> Processing Dependency: spectrum-scale-object >= 4.2.0 for package: gpfs.protocols-support-4.2.0-0.noarch --> Processing Dependency: nfs-ganesha >= 2.2 for package: gpfs.protocols-support-4.2.0-0.noarch --> Running transaction check ---> Package gpfs.protocols-support.noarch 0:4.2.0-0 will be installed --> Processing Dependency: spectrum-scale-object >= 4.2.0 for package: gpfs.protocols-support-4.2.0-0.noarch ---> Package nfs-ganesha.x86_64 0:2.3.0-1.el7 will be installed --> Processing Dependency: libntirpc.so.1.3(NTIRPC_1.3.1)(64bit) for package: nfs-ganesha-2.3.0-1.el7.x86_64 --> Processing Dependency: libntirpc.so.1.3()(64bit) for package: nfs-ganesha-2.3.0-1.el7.x86_64 --> Processing Dependency: libjemalloc.so.1()(64bit) for package: nfs-ganesha-2.3.0-1.el7.x86_64 From: > on behalf of Mathias Dietz > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Wednesday, 16 December 2015 at 12:02 To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] 4.2 & protocols (missing dependency?) Hi, you are right that python-ldap is a required dependency for 4.2 protocol nodes. Please make sure to have the gpfs.protocols-support-4.2.0-0.noarch RPM installed on protocol nodes because this package will enforce the dependencies. >> rpm -qi gpfs.protocols-support-4.2.0-0.noarch Name : gpfs.protocols-support Version : 4.2.0 Release : 0 Architecture: noarch Install Date: Wed 16 Dec 2015 07:56:42 PM CET Group : System Environment/Base Size : 0 License : (C) COPYRIGHT International Business Machines Corp. 2015 Signature : (none) Source RPM : gpfs.protocols-support-4.2.0-0.src.rpm Build Date : Sat 14 Nov 2015 12:20:07 AM CET Build Host : bldlnx84.pok.stglabs.ibm.com Relocations : (not relocatable) Summary : gpfs protocol dependencies Description : This package includes the dependency list for all the protocols to enforce that all relevant Spectrum Scale protocol packages are installed. If this package is not installed "mmchnode" will fail with an appropriate message. [root at p8-10-rhel-71be-01 ~]# rpm -qi gpfs.protocols-support-4.2.0-0.noarch --requires Name : gpfs.protocols-support Version : 4.2.0 Release : 0 Architecture: noarch Install Date: Wed 16 Dec 2015 07:56:42 PM CET Group : System Environment/Base Size : 0 License : (C) COPYRIGHT International Business Machines Corp. 2015 Signature : (none) Source RPM : gpfs.protocols-support-4.2.0-0.src.rpm Build Date : Sat 14 Nov 2015 12:20:07 AM CET Build Host : bldlnx84.pok.stglabs.ibm.com Relocations : (not relocatable) Summary : gpfs protocol dependencies Description : This package includes the dependency list for all the protocols to enforce that all relevant Spectrum Scale protocol packages are installed. If this package is not installed "mmchnode" will fail with an appropriate message. gpfs.base >= 4.2.0 nfs-ganesha >= 2.2 gpfs.smb >= 4.2.0_gpfs spectrum-scale-object >= 4.2.0 python-ldap rpmlib(PayloadFilesHavePrefix) <= 4.0-1 rpmlib(CompressedFileNames) <= 3.0.4-1 Mit freundlichen Gr??en / Kind regards Mathias Dietz Spectrum Scale Development System Health Team - Scrum Master IBM Certified Software Engineer ---------------------------------------------------------------------------------------------------------- IBM Deutschland Hechtsheimer Str. 2 55131 Mainz Phone: +49-6131-84-2027 Mobile: +49-15152801035 E-Mail: mdietz at de.ibm.com ---------------------------------------------------------------------------------------------------------- IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martina Koederitz, Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "Simon Thompson (Research Computing - IT Services)" > To: "gpfsug-discuss at spectrumscale.org" > Date: 12/15/2015 11:50 PM Subject: [gpfsug-discuss] 4.2 & protocols (missing dependency?) Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi, I;ve just upgraded some of my protocol nodes to 4.2, I noticed on startup that in the logs I get: Traceback (most recent call last): File "/usr/lpp/mmfs/bin/mmcesmon.py", line 178, in import mmcesmon.CommandHandler File "/usr/lpp/mmfs/lib/mmcesmon/CommandHandler.py", line 29, in from FILEService import FILEService File "/usr/lpp/mmfs/lib/mmcesmon/FILEService.py", line 19, in from ExtAuthMonitor import ActiveDirectoryServiceMonitor File "/usr/lpp/mmfs/lib/mmcesmon/ExtAuthMonitor.py", line 15, in import ldap ImportError: No module named ldap Tue 15 Dec 22:39:12 GMT 2015: mmcesmonitor: Monitor has started pid=18963 Traceback (most recent call last): File "/usr/lpp/mmfs/bin/mmcesmon.py", line 178, in import mmcesmon.CommandHandler File "/usr/lpp/mmfs/lib/mmcesmon/CommandHandler.py", line 29, in from FILEService import FILEService File "/usr/lpp/mmfs/lib/mmcesmon/FILEService.py", line 19, in from ExtAuthMonitor import ActiveDirectoryServiceMonitor File "/usr/lpp/mmfs/lib/mmcesmon/ExtAuthMonitor.py", line 15, in import ldap ImportError: No module named ldap Error: Cannot connect to server(localhost), port(/var/mmfs/ces/mmcesmonitor.socket): No such file or directory It looks like one EL7, you also need python-ldap installed (perhaps the installer does this, but it should really be a dependency of the RPM if its required?). Anyway, if you see issues, add the python-ldap RPM and it should fix it. Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From MDIETZ at de.ibm.com Wed Dec 16 12:43:09 2015 From: MDIETZ at de.ibm.com (Mathias Dietz) Date: Wed, 16 Dec 2015 13:43:09 +0100 Subject: [gpfsug-discuss] 4.2 & protocols (missing dependency?) In-Reply-To: References: <201512161202.tBGC2eda005218@d06av03.portsmouth.uk.ibm.com> Message-ID: <201512161243.tBGChGwd017486@d06av05.portsmouth.uk.ibm.com> I see your point, but our recommendation is to always install gpfs.protocols-support-4.2.0-0.noarch on protocol nodes, even if a single protocol is used only. This is consistent with how the Spectrum Scale installer is setting up systems. Mit freundlichen Gr??en / Kind regards Mathias Dietz Spectrum Scale Development System Health Team - Scrum Master IBM Certified Software Engineer ---------------------------------------------------------------------------------------------------------- IBM Deutschland Hechtsheimer Str. 2 55131 Mainz Phone: +49-6131-84-2027 Mobile: +49-15152801035 E-Mail: mdietz at de.ibm.com ---------------------------------------------------------------------------------------------------------- IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martina Koederitz, Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "Simon Thompson (Research Computing - IT Services)" To: gpfsug main discussion list Date: 12/16/2015 01:16 PM Subject: Re: [gpfsug-discuss] 4.2 & protocols (missing dependency?) Sent by: gpfsug-discuss-bounces at spectrumscale.org OK, Iooked at that. This means pulling in all the object and NFS stuff as well onto my server as well. I only run SMB, so I don;'t want lots of other stuff installing as well .. --> Running transaction check ---> Package gpfs.protocols-support.noarch 0:4.2.0-0 will be installed --> Processing Dependency: spectrum-scale-object >= 4.2.0 for package: gpfs.protocols-support-4.2.0-0.noarch --> Processing Dependency: nfs-ganesha >= 2.2 for package: gpfs.protocols-support-4.2.0-0.noarch --> Running transaction check ---> Package gpfs.protocols-support.noarch 0:4.2.0-0 will be installed --> Processing Dependency: spectrum-scale-object >= 4.2.0 for package: gpfs.protocols-support-4.2.0-0.noarch ---> Package nfs-ganesha.x86_64 0:2.3.0-1.el7 will be installed --> Processing Dependency: libntirpc.so.1.3(NTIRPC_1.3.1)(64bit) for package: nfs-ganesha-2.3.0-1.el7.x86_64 --> Processing Dependency: libntirpc.so.1.3()(64bit) for package: nfs-ganesha-2.3.0-1.el7.x86_64 --> Processing Dependency: libjemalloc.so.1()(64bit) for package: nfs-ganesha-2.3.0-1.el7.x86_64 From: on behalf of Mathias Dietz Reply-To: "gpfsug-discuss at spectrumscale.org" < gpfsug-discuss at spectrumscale.org> Date: Wednesday, 16 December 2015 at 12:02 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] 4.2 & protocols (missing dependency?) Hi, you are right that python-ldap is a required dependency for 4.2 protocol nodes. Please make sure to have the gpfs.protocols-support-4.2.0-0.noarch RPM installed on protocol nodes because this package will enforce the dependencies. >> rpm -qi gpfs.protocols-support-4.2.0-0.noarch Name : gpfs.protocols-support Version : 4.2.0 Release : 0 Architecture: noarch Install Date: Wed 16 Dec 2015 07:56:42 PM CET Group : System Environment/Base Size : 0 License : (C) COPYRIGHT International Business Machines Corp. 2015 Signature : (none) Source RPM : gpfs.protocols-support-4.2.0-0.src.rpm Build Date : Sat 14 Nov 2015 12:20:07 AM CET Build Host : bldlnx84.pok.stglabs.ibm.com Relocations : (not relocatable) Summary : gpfs protocol dependencies Description : This package includes the dependency list for all the protocols to enforce that all relevant Spectrum Scale protocol packages are installed. If this package is not installed "mmchnode" will fail with an appropriate message. [root at p8-10-rhel-71be-01 ~]# rpm -qi gpfs.protocols-support-4.2.0-0.noarch --requires Name : gpfs.protocols-support Version : 4.2.0 Release : 0 Architecture: noarch Install Date: Wed 16 Dec 2015 07:56:42 PM CET Group : System Environment/Base Size : 0 License : (C) COPYRIGHT International Business Machines Corp. 2015 Signature : (none) Source RPM : gpfs.protocols-support-4.2.0-0.src.rpm Build Date : Sat 14 Nov 2015 12:20:07 AM CET Build Host : bldlnx84.pok.stglabs.ibm.com Relocations : (not relocatable) Summary : gpfs protocol dependencies Description : This package includes the dependency list for all the protocols to enforce that all relevant Spectrum Scale protocol packages are installed. If this package is not installed "mmchnode" will fail with an appropriate message. gpfs.base >= 4.2.0 nfs-ganesha >= 2.2 gpfs.smb >= 4.2.0_gpfs spectrum-scale-object >= 4.2.0 python-ldap rpmlib(PayloadFilesHavePrefix) <= 4.0-1 rpmlib(CompressedFileNames) <= 3.0.4-1 Mit freundlichen Gr??en / Kind regards Mathias Dietz Spectrum Scale Development System Health Team - Scrum Master IBM Certified Software Engineer ---------------------------------------------------------------------------------------------------------- IBM Deutschland Hechtsheimer Str. 2 55131 Mainz Phone: +49-6131-84-2027 Mobile: +49-15152801035 E-Mail: mdietz at de.ibm.com ---------------------------------------------------------------------------------------------------------- IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martina Koederitz, Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "Simon Thompson (Research Computing - IT Services)" < S.J.Thompson at bham.ac.uk> To: "gpfsug-discuss at spectrumscale.org" < gpfsug-discuss at spectrumscale.org> Date: 12/15/2015 11:50 PM Subject: [gpfsug-discuss] 4.2 & protocols (missing dependency?) Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, I;ve just upgraded some of my protocol nodes to 4.2, I noticed on startup that in the logs I get: Traceback (most recent call last): File "/usr/lpp/mmfs/bin/mmcesmon.py", line 178, in import mmcesmon.CommandHandler File "/usr/lpp/mmfs/lib/mmcesmon/CommandHandler.py", line 29, in from FILEService import FILEService File "/usr/lpp/mmfs/lib/mmcesmon/FILEService.py", line 19, in from ExtAuthMonitor import ActiveDirectoryServiceMonitor File "/usr/lpp/mmfs/lib/mmcesmon/ExtAuthMonitor.py", line 15, in import ldap ImportError: No module named ldap Tue 15 Dec 22:39:12 GMT 2015: mmcesmonitor: Monitor has started pid=18963 Traceback (most recent call last): File "/usr/lpp/mmfs/bin/mmcesmon.py", line 178, in import mmcesmon.CommandHandler File "/usr/lpp/mmfs/lib/mmcesmon/CommandHandler.py", line 29, in from FILEService import FILEService File "/usr/lpp/mmfs/lib/mmcesmon/FILEService.py", line 19, in from ExtAuthMonitor import ActiveDirectoryServiceMonitor File "/usr/lpp/mmfs/lib/mmcesmon/ExtAuthMonitor.py", line 15, in import ldap ImportError: No module named ldap Error: Cannot connect to server(localhost), port(/var/mmfs/ces/mmcesmonitor.socket): No such file or directory It looks like one EL7, you also need python-ldap installed (perhaps the installer does this, but it should really be a dependency of the RPM if its required?). Anyway, if you see issues, add the python-ldap RPM and it should fix it. Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From service at metamodul.com Wed Dec 16 13:39:58 2015 From: service at metamodul.com (service at metamodul.com) Date: Wed, 16 Dec 2015 14:39:58 +0100 (CET) Subject: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting In-Reply-To: <1449868810019.2038@iu.edu> References: <1449509014517.19529@iu.edu> <49D9766B-B369-4896-A466-DB60692FE08F@vanderbilt.edu> , <8E00046D-C44B-4A5C-905B-B56019C94097@vanderbilt.edu>, <1449605949971.76189@iu.edu>, <858195fae73441fc9e65085c1d32071f@mbxtoa1.winmail.deshaw.com> <1449764061478.4880@iu.edu>, <1449868810019.2038@iu.edu> Message-ID: <1915484747.61999.880fe402-e31c-4a50-9e89-04df90ee7e9f.open-xchange@email.1und1.de> ... last week, you are in for one wild ride. I would also point out that the flapping did not stop until we resolved connectivity for *all* of the clients, so remember that even having one single half-connected client is poisonous to your stability. ... In this context i think GPFS should provide somekind of monitoring better than ping. In the good old days remote clusters even over wan might not exist that often but i think it changed pretty much nowadays.. If remote clusters are even out of the management ability/responsiblity of the cluster admin, remote firewalls/network settings can have seriously impact on the local cluster without the ability to fix the problem. Something nobody would like to see. With kind regards Hajo -------------- next part -------------- An HTML attachment was scrubbed... URL: From Luke.Raimbach at crick.ac.uk Wed Dec 16 23:11:29 2015 From: Luke.Raimbach at crick.ac.uk (Luke Raimbach) Date: Wed, 16 Dec 2015 23:11:29 +0000 Subject: [gpfsug-discuss] Cluster ID (mis)match Message-ID: Dear All, Let's pretend: I have three GPFS clusters: two storage clusters (just NSD servers) with one file system per storage cluster; and a client cluster (with just compute nodes). The three clusters all have multi-cluster relationships set up so that all nodes in all clusters can mount the two file systems. Now imagine that the two storage clusters got accidentally provisioned with the same cluster ID. What would happen, please? Special thanks to people who can explain the internal workings of cluster membership lookups for multicluster nodes (I'm interested in the GPFS internals here). For example, where in the GPFS code does the cluster ID make a difference to which cluster manager is contacted? Cheers GPFSUG once more! Luke. Luke Raimbach? Senior HPC Data and Storage Systems Engineer, The Francis Crick Institute, Gibbs Building, 215 Euston Road, London NW1 2BE. E: luke.raimbach at crick.ac.uk W: www.crick.ac.uk The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 215 Euston Road, London NW1 2BE. From S.J.Thompson at bham.ac.uk Thu Dec 17 16:02:12 2015 From: S.J.Thompson at bham.ac.uk (Simon Thompson (Research Computing - IT Services)) Date: Thu, 17 Dec 2015 16:02:12 +0000 Subject: [gpfsug-discuss] 4.2 & protocols (missing dependency?) In-Reply-To: <201512161243.tBGChGwd017486@d06av05.portsmouth.uk.ibm.com> References: <201512161202.tBGC2eda005218@d06av03.portsmouth.uk.ibm.com> <201512161243.tBGChGwd017486@d06av05.portsmouth.uk.ibm.com> Message-ID: See, this sort of thing: "A security vulnerability has been identified in the current levels of IBM Spectrum Scale V4.1.1 thru 4.1.1.3 and V4.2.0.0 that could allow a local unprivileged user, or a user with network access to the IBM Spectrum Scale cluster, to access admin passwords for object storage infrastructure. This vulnerability only affects clusters which have installed and deployed the Object protocol." Is exactly why we don't want to be installing components that we aren't actively using ... Simon From: > on behalf of Mathias Dietz > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Wednesday, 16 December 2015 at 12:43 To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] 4.2 & protocols (missing dependency?) I see your point, but our recommendation is to always install gpfs.protocols-support-4.2.0-0.noarch on protocol nodes, even if a single protocol is used only. This is consistent with how the Spectrum Scale installer is setting up systems. Mit freundlichen Gr??en / Kind regards Mathias Dietz Spectrum Scale Development System Health Team - Scrum Master IBM Certified Software Engineer ---------------------------------------------------------------------------------------------------------- IBM Deutschland Hechtsheimer Str. 2 55131 Mainz Phone: +49-6131-84-2027 Mobile: +49-15152801035 E-Mail: mdietz at de.ibm.com ---------------------------------------------------------------------------------------------------------- IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martina Koederitz, Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "Simon Thompson (Research Computing - IT Services)" > To: gpfsug main discussion list > Date: 12/16/2015 01:16 PM Subject: Re: [gpfsug-discuss] 4.2 & protocols (missing dependency?) Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ OK, Iooked at that. This means pulling in all the object and NFS stuff as well onto my server as well. I only run SMB, so I don;'t want lots of other stuff installing as well .. --> Running transaction check ---> Package gpfs.protocols-support.noarch 0:4.2.0-0 will be installed --> Processing Dependency: spectrum-scale-object >= 4.2.0 for package: gpfs.protocols-support-4.2.0-0.noarch --> Processing Dependency: nfs-ganesha >= 2.2 for package: gpfs.protocols-support-4.2.0-0.noarch --> Running transaction check ---> Package gpfs.protocols-support.noarch 0:4.2.0-0 will be installed --> Processing Dependency: spectrum-scale-object >= 4.2.0 for package: gpfs.protocols-support-4.2.0-0.noarch ---> Package nfs-ganesha.x86_64 0:2.3.0-1.el7 will be installed --> Processing Dependency: libntirpc.so.1.3(NTIRPC_1.3.1)(64bit) for package: nfs-ganesha-2.3.0-1.el7.x86_64 --> Processing Dependency: libntirpc.so.1.3()(64bit) for package: nfs-ganesha-2.3.0-1.el7.x86_64 --> Processing Dependency: libjemalloc.so.1()(64bit) for package: nfs-ganesha-2.3.0-1.el7.x86_64 From: > on behalf of Mathias Dietz > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Wednesday, 16 December 2015 at 12:02 To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] 4.2 & protocols (missing dependency?) Hi, you are right that python-ldap is a required dependency for 4.2 protocol nodes. Please make sure to have the gpfs.protocols-support-4.2.0-0.noarch RPM installed on protocol nodes because this package will enforce the dependencies. >> rpm -qi gpfs.protocols-support-4.2.0-0.noarch Name : gpfs.protocols-support Version : 4.2.0 Release : 0 Architecture: noarch Install Date: Wed 16 Dec 2015 07:56:42 PM CET Group : System Environment/Base Size : 0 License : (C) COPYRIGHT International Business Machines Corp. 2015 Signature : (none) Source RPM : gpfs.protocols-support-4.2.0-0.src.rpm Build Date : Sat 14 Nov 2015 12:20:07 AM CET Build Host : bldlnx84.pok.stglabs.ibm.com Relocations : (not relocatable) Summary : gpfs protocol dependencies Description : This package includes the dependency list for all the protocols to enforce that all relevant Spectrum Scale protocol packages are installed. If this package is not installed "mmchnode" will fail with an appropriate message. [root at p8-10-rhel-71be-01 ~]# rpm -qi gpfs.protocols-support-4.2.0-0.noarch --requires Name : gpfs.protocols-support Version : 4.2.0 Release : 0 Architecture: noarch Install Date: Wed 16 Dec 2015 07:56:42 PM CET Group : System Environment/Base Size : 0 License : (C) COPYRIGHT International Business Machines Corp. 2015 Signature : (none) Source RPM : gpfs.protocols-support-4.2.0-0.src.rpm Build Date : Sat 14 Nov 2015 12:20:07 AM CET Build Host : bldlnx84.pok.stglabs.ibm.com Relocations : (not relocatable) Summary : gpfs protocol dependencies Description : This package includes the dependency list for all the protocols to enforce that all relevant Spectrum Scale protocol packages are installed. If this package is not installed "mmchnode" will fail with an appropriate message. gpfs.base >= 4.2.0 nfs-ganesha >= 2.2 gpfs.smb >= 4.2.0_gpfs spectrum-scale-object >= 4.2.0 python-ldap rpmlib(PayloadFilesHavePrefix) <= 4.0-1 rpmlib(CompressedFileNames) <= 3.0.4-1 Mit freundlichen Gr??en / Kind regards Mathias Dietz Spectrum Scale Development System Health Team - Scrum Master IBM Certified Software Engineer ---------------------------------------------------------------------------------------------------------- IBM Deutschland Hechtsheimer Str. 2 55131 Mainz Phone: +49-6131-84-2027 Mobile: +49-15152801035 E-Mail: mdietz at de.ibm.com ---------------------------------------------------------------------------------------------------------- IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martina Koederitz, Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "Simon Thompson (Research Computing - IT Services)"