From r.sobey at imperial.ac.uk Fri Sep 1 09:45:24 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Fri, 1 Sep 2017 08:45:24 +0000 Subject: [gpfsug-discuss] GPFS GUI Nodes > NSD no data Message-ID: For some time now if I go into the GUI, select Monitoring > Nodes > NSD Server Nodes, the only columns with good data are Name, State and NSD Count. Everything else e.g. Avg Disk Wait Read is listed "N/A". Is this another config option I need to enable? It's been bugging me for a while, I don't think I've seen it work since 4.2.1 which was the first time I saw the GUI. Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From bart.vandamme at sdnsquare.com Fri Sep 1 10:30:59 2017 From: bart.vandamme at sdnsquare.com (Bart Van Damme) Date: Fri, 1 Sep 2017 11:30:59 +0200 Subject: [gpfsug-discuss] SMB2 leases - oplocks - growing files Message-ID: We are a company located in Belgium that mainly implements spectrum scale clusters in the Media and broadcasting industry. Currently we have a customer who wants to export the scale file system over samba 4.5 and 4.6. In these versions the SMB2 leases are activated by default for enhancing the oplocks system. The problem is when this option is not disabled Adobe (and probably Windows) is not notified the size of the file have changed, resulting that reading growing file in Adobe is not working, the timeline is not updated. Does anybody had this issues before and know how to solve it. This is the smb.conf file: ============================ # Global options smb2 leases = yes client use spnego = yes clustering = yes unix extensions = no mangled names = no ea support = yes store dos attributes = yes map readonly = no map archive = yes map system = no force unknown acl user = yes obey pam restrictions = no deadtime = 480 disable netbios = yes server signing = disabled server min protocol = SMB2 smb encrypt = off # We do not allow guest usage. guest ok = no guest account = nobody map to guest = bad user # disable printing load printers = no printing = bsd printcap name = /dev/null disable spoolss = yes # log settings log file = /var/log/samba/log.%m # max 500KB per log file, then rotate max log size = 500 log level = 1 passdb:1 auth:1 winbind:1 idmap:1 #============ Share Definitions ============ [pfs] comment = GPFS path = /gpfs/pfs valid users = @ug_numpr writeable = yes inherit permissions = yes create mask = 664 force create mode = 664 nfs4:chown = yes nfs4:acedup = merge nfs4:mode = special fileid:algorithm = fsname vfs objects = shadow_copy2 gpfs fileid full_audit full_audit:prefix = %u|%I|%m|%S full_audit:success = rename unlink rmdir full_audit:failure = none full_audit:facility = local6 full_audit:priority = NOTICE shadow:fixinodes = yes gpfs:sharemodes = yes gpfs:winattr = yes gpfs:leases = no locking = yes posix locking = yes oplocks = yes kernel oplocks = no Grtz, Bart *Bart Van Damme * *Customer Project Manager* *SDNsquare* Technologiepark 3, 9052 Zwijnaarde, Belgium www.sdnsquare.com T: + 32 9 241 56 01 <09%20241%2056%2001> M: + 32 496 59 23 09 *This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email.* Virusvrij. www.avast.com <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Fri Sep 1 14:36:56 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Fri, 1 Sep 2017 13:36:56 +0000 Subject: [gpfsug-discuss] GPFS GUI Nodes > NSD no data In-Reply-To: References: Message-ID: Resolved this, guessed at changing GPFSNSDDisk.period to 5. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 01 September 2017 09:45 To: 'gpfsug-discuss at spectrumscale.org' Subject: [gpfsug-discuss] GPFS GUI Nodes > NSD no data For some time now if I go into the GUI, select Monitoring > Nodes > NSD Server Nodes, the only columns with good data are Name, State and NSD Count. Everything else e.g. Avg Disk Wait Read is listed "N/A". Is this another config option I need to enable? It's been bugging me for a while, I don't think I've seen it work since 4.2.1 which was the first time I saw the GUI. Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Fri Sep 1 21:56:25 2017 From: ewahl at osc.edu (Edward Wahl) Date: Fri, 1 Sep 2017 16:56:25 -0400 Subject: [gpfsug-discuss] Change to default for verbsRdmaMinBytes? Message-ID: <20170901165625.6e4edd4c@osc.edu> Howdy. Just noticed this change to min RDMA packet size and I don't seem to see it in any patch notes. Maybe I just skipped the one where this changed? mmlsconfig verbsRdmaMinBytes verbsRdmaMinBytes 16384 (in case someone thinks we changed it) [root at proj-nsd01 ~]# mmlsconfig |grep verbs verbsRdma enable verbsRdma disable verbsRdmasPerConnection 14 verbsRdmasPerNode 1024 verbsPorts mlx5_3/1 verbsPorts mlx4_0 verbsPorts mlx5_0 verbsPorts mlx5_0 mlx5_1 verbsPorts mlx4_1/1 verbsPorts mlx4_1/2 Oddly I also see this in config, though I've seen these kinds of things before. mmdiag --config |grep verbsRdmaMinBytes verbsRdmaMinBytes 8192 We're on a recent efix. Current GPFS build: "4.2.2.3 efix21 (1028007)". -- Ed Wahl Ohio Supercomputer Center 614-292-9302 From akers at vt.edu Fri Sep 1 22:06:15 2017 From: akers at vt.edu (Joshua Akers) Date: Fri, 01 Sep 2017 21:06:15 +0000 Subject: [gpfsug-discuss] Quorum managers Message-ID: Hi all, I was wondering how most people set up quorum managers. We historically had physical admin nodes be the quorum managers, but are switching to a virtualized admin services infrastructure. We have been choosing a few compute nodes to act as quorum managers in our client clusters, but have considered using virtual machines instead. Has anyone else done this? Regards, Josh -- *Joshua D. Akers* *HPC Team Lead* NI&S Systems Support (MC0214) 1700 Pratt Drive Blacksburg, VA 24061 540-231-9506 -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Fri Sep 1 23:42:55 2017 From: oehmes at gmail.com (Sven Oehme) Date: Fri, 01 Sep 2017 22:42:55 +0000 Subject: [gpfsug-discuss] Change to default for verbsRdmaMinBytes? In-Reply-To: <20170901165625.6e4edd4c@osc.edu> References: <20170901165625.6e4edd4c@osc.edu> Message-ID: Hi Ed, yes the defaults for that have changed for customers who had not overridden the default settings. the reason we did this was that many systems in the field including all ESS systems that come pre-tuned where manually changed to 8k from the 16k default due to better performance that was confirmed in multiple customer engagements and tests with various settings , therefore we change the default to what it should be in the field so people are not bothered to set it anymore (simplification) or get benefits by changing the default to provides better performance. all this happened when we did the communication code overhaul that did lead to significant (think factors) of improved RPC performance for RDMA and VERBS workloads. there is another round of significant enhancements coming soon , that will make even more parameters either obsolete or change some of the defaults for better out of the box performance. i see that we should probably enhance the communication of this changes, not that i think this will have any negative effect compared to what your performance was with the old setting i am actually pretty confident that you get better performance with the new code, but by setting parameters back to default on most 'manual tuned' probably makes your system even faster. if you have a Scale Client on 4.2.3+ you really shouldn't have anything set beside maxfilestocache, pagepool, workerthreads and potential prefetch , if you are a protocol node, this and settings specific to an export (e.g. SMB, NFS set some special settings) , pretty much everything else these days should be set to default so the code can pick the correct parameters., if its not and you get better performance by manual tweaking something i like to hear about it. on the communication side in the next release will eliminate another set of parameters that are now 'auto set' and we plan to work on NSD next. i presented various slides about the communication and simplicity changes in various forums, latest public non NDA slides i presented are here --> http://files.gpfsug.org/presentations/2017/Manchester/08_Research_Topics.pdf hope this helps . Sven On Fri, Sep 1, 2017 at 1:56 PM Edward Wahl wrote: > Howdy. Just noticed this change to min RDMA packet size and I don't seem > to > see it in any patch notes. Maybe I just skipped the one where this > changed? > > mmlsconfig verbsRdmaMinBytes > verbsRdmaMinBytes 16384 > > (in case someone thinks we changed it) > > [root at proj-nsd01 ~]# mmlsconfig |grep verbs > verbsRdma enable > verbsRdma disable > verbsRdmasPerConnection 14 > verbsRdmasPerNode 1024 > verbsPorts mlx5_3/1 > verbsPorts mlx4_0 > verbsPorts mlx5_0 > verbsPorts mlx5_0 mlx5_1 > verbsPorts mlx4_1/1 > verbsPorts mlx4_1/2 > > > Oddly I also see this in config, though I've seen these kinds of things > before. > mmdiag --config |grep verbsRdmaMinBytes > verbsRdmaMinBytes 8192 > > We're on a recent efix. > Current GPFS build: "4.2.2.3 efix21 (1028007)". > > -- > > Ed Wahl > Ohio Supercomputer Center > 614-292-9302 <(614)%20292-9302> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From truongv at us.ibm.com Fri Sep 1 23:56:23 2017 From: truongv at us.ibm.com (Truong Vu) Date: Fri, 1 Sep 2017 18:56:23 -0400 Subject: [gpfsug-discuss] Change to default for verbsRdmaMinBytes? In-Reply-To: References: Message-ID: The discrepancy between the mmlsconfig view and mmdiag has been fixed in GFPS 4.2.3 version. Note, mmdiag reports the correct default value. Tru. From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 09/01/2017 06:43 PM Subject: gpfsug-discuss Digest, Vol 68, Issue 2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=yK4FkYvJ21ubvurR6W1Pi3qvNw9ydj2XP0ghXPc7DUw&s=xZHUN9ZlFjvgBmBB8wnX2cQDQQV42R_q-xHubNA3JBM&e= or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: GPFS GUI Nodes > NSD no data (Sobey, Richard A) 2. Change to default for verbsRdmaMinBytes? (Edward Wahl) 3. Quorum managers (Joshua Akers) 4. Re: Change to default for verbsRdmaMinBytes? (Sven Oehme) ---------------------------------------------------------------------- Message: 1 Date: Fri, 1 Sep 2017 13:36:56 +0000 From: "Sobey, Richard A" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS GUI Nodes > NSD no data Message-ID: Content-Type: text/plain; charset="us-ascii" Resolved this, guessed at changing GPFSNSDDisk.period to 5. From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 01 September 2017 09:45 To: 'gpfsug-discuss at spectrumscale.org' Subject: [gpfsug-discuss] GPFS GUI Nodes > NSD no data For some time now if I go into the GUI, select Monitoring > Nodes > NSD Server Nodes, the only columns with good data are Name, State and NSD Count. Everything else e.g. Avg Disk Wait Read is listed "N/A". Is this another config option I need to enable? It's been bugging me for a while, I don't think I've seen it work since 4.2.1 which was the first time I saw the GUI. Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: < https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_pipermail_gpfsug-2Ddiscuss_attachments_20170901_2a4162e9_attachment-2D0001.html&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=yK4FkYvJ21ubvurR6W1Pi3qvNw9ydj2XP0ghXPc7DUw&s=jcPGl5zwtQFMbnEmBpNErsD43uwoVeKgKk_8j7ZeCJY&e= > ------------------------------ Message: 2 Date: Fri, 1 Sep 2017 16:56:25 -0400 From: Edward Wahl To: gpfsug main discussion list Subject: [gpfsug-discuss] Change to default for verbsRdmaMinBytes? Message-ID: <20170901165625.6e4edd4c at osc.edu> Content-Type: text/plain; charset="US-ASCII" Howdy. Just noticed this change to min RDMA packet size and I don't seem to see it in any patch notes. Maybe I just skipped the one where this changed? mmlsconfig verbsRdmaMinBytes verbsRdmaMinBytes 16384 (in case someone thinks we changed it) [root at proj-nsd01 ~]# mmlsconfig |grep verbs verbsRdma enable verbsRdma disable verbsRdmasPerConnection 14 verbsRdmasPerNode 1024 verbsPorts mlx5_3/1 verbsPorts mlx4_0 verbsPorts mlx5_0 verbsPorts mlx5_0 mlx5_1 verbsPorts mlx4_1/1 verbsPorts mlx4_1/2 Oddly I also see this in config, though I've seen these kinds of things before. mmdiag --config |grep verbsRdmaMinBytes verbsRdmaMinBytes 8192 We're on a recent efix. Current GPFS build: "4.2.2.3 efix21 (1028007)". -- Ed Wahl Ohio Supercomputer Center 614-292-9302 ------------------------------ Message: 3 Date: Fri, 01 Sep 2017 21:06:15 +0000 From: Joshua Akers To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Quorum managers Message-ID: Content-Type: text/plain; charset="utf-8" Hi all, I was wondering how most people set up quorum managers. We historically had physical admin nodes be the quorum managers, but are switching to a virtualized admin services infrastructure. We have been choosing a few compute nodes to act as quorum managers in our client clusters, but have considered using virtual machines instead. Has anyone else done this? Regards, Josh -- *Joshua D. Akers* *HPC Team Lead* NI&S Systems Support (MC0214) 1700 Pratt Drive Blacksburg, VA 24061 540-231-9506 -------------- next part -------------- An HTML attachment was scrubbed... URL: < https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_pipermail_gpfsug-2Ddiscuss_attachments_20170901_a49947db_attachment-2D0001.html&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=yK4FkYvJ21ubvurR6W1Pi3qvNw9ydj2XP0ghXPc7DUw&s=Gag0raQbp7KZAyINlnmuxlnpjboo9XOWO3dDL2HCsZo&e= > ------------------------------ Message: 4 Date: Fri, 01 Sep 2017 22:42:55 +0000 From: Sven Oehme To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Change to default for verbsRdmaMinBytes? Message-ID: Content-Type: text/plain; charset="utf-8" Hi Ed, yes the defaults for that have changed for customers who had not overridden the default settings. the reason we did this was that many systems in the field including all ESS systems that come pre-tuned where manually changed to 8k from the 16k default due to better performance that was confirmed in multiple customer engagements and tests with various settings , therefore we change the default to what it should be in the field so people are not bothered to set it anymore (simplification) or get benefits by changing the default to provides better performance. all this happened when we did the communication code overhaul that did lead to significant (think factors) of improved RPC performance for RDMA and VERBS workloads. there is another round of significant enhancements coming soon , that will make even more parameters either obsolete or change some of the defaults for better out of the box performance. i see that we should probably enhance the communication of this changes, not that i think this will have any negative effect compared to what your performance was with the old setting i am actually pretty confident that you get better performance with the new code, but by setting parameters back to default on most 'manual tuned' probably makes your system even faster. if you have a Scale Client on 4.2.3+ you really shouldn't have anything set beside maxfilestocache, pagepool, workerthreads and potential prefetch , if you are a protocol node, this and settings specific to an export (e.g. SMB, NFS set some special settings) , pretty much everything else these days should be set to default so the code can pick the correct parameters., if its not and you get better performance by manual tweaking something i like to hear about it. on the communication side in the next release will eliminate another set of parameters that are now 'auto set' and we plan to work on NSD next. i presented various slides about the communication and simplicity changes in various forums, latest public non NDA slides i presented are here --> https://urldefense.proofpoint.com/v2/url?u=http-3A__files.gpfsug.org_presentations_2017_Manchester_08-5FResearch-5FTopics.pdf&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=yK4FkYvJ21ubvurR6W1Pi3qvNw9ydj2XP0ghXPc7DUw&s=8c_55Ld_iAC2sr_QU0cyGiOiyU7Z9NjcVknVuRpRIlk&e= hope this helps . Sven On Fri, Sep 1, 2017 at 1:56 PM Edward Wahl wrote: > Howdy. Just noticed this change to min RDMA packet size and I don't seem > to > see it in any patch notes. Maybe I just skipped the one where this > changed? > > mmlsconfig verbsRdmaMinBytes > verbsRdmaMinBytes 16384 > > (in case someone thinks we changed it) > > [root at proj-nsd01 ~]# mmlsconfig |grep verbs > verbsRdma enable > verbsRdma disable > verbsRdmasPerConnection 14 > verbsRdmasPerNode 1024 > verbsPorts mlx5_3/1 > verbsPorts mlx4_0 > verbsPorts mlx5_0 > verbsPorts mlx5_0 mlx5_1 > verbsPorts mlx4_1/1 > verbsPorts mlx4_1/2 > > > Oddly I also see this in config, though I've seen these kinds of things > before. > mmdiag --config |grep verbsRdmaMinBytes > verbsRdmaMinBytes 8192 > > We're on a recent efix. > Current GPFS build: "4.2.2.3 efix21 (1028007)". > > -- > > Ed Wahl > Ohio Supercomputer Center > 614-292-9302 <(614)%20292-9302> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=yK4FkYvJ21ubvurR6W1Pi3qvNw9ydj2XP0ghXPc7DUw&s=xZHUN9ZlFjvgBmBB8wnX2cQDQQV42R_q-xHubNA3JBM&e= > -------------- next part -------------- An HTML attachment was scrubbed... URL: < https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_pipermail_gpfsug-2Ddiscuss_attachments_20170901_b75cfc74_attachment.html&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=yK4FkYvJ21ubvurR6W1Pi3qvNw9ydj2XP0ghXPc7DUw&s=LpVpXMgqE_LD-t_J7yfNwURUrdUR29TzWvjVTi18kpA&e= > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=yK4FkYvJ21ubvurR6W1Pi3qvNw9ydj2XP0ghXPc7DUw&s=xZHUN9ZlFjvgBmBB8wnX2cQDQQV42R_q-xHubNA3JBM&e= End of gpfsug-discuss Digest, Vol 68, Issue 2 ********************************************* -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From r.sobey at imperial.ac.uk Sat Sep 2 10:35:34 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Sat, 2 Sep 2017 09:35:34 +0000 Subject: [gpfsug-discuss] Date formats inconsistent mmfs.log Message-ID: Is there a good reason for the date formats in mmfs.log to be inconsistent? Apart from my OCD getting the better of me, it makes log analysis a bit difficult. Sat Sep 2 10:33:42.145 2017: [I] Command: successful mount gpfs Sat 2 Sep 10:33:42 BST 2017: finished mounting /dev/gpfs Sat Sep 2 10:33:42.168 2017: [I] Calling user exit script mmSysMonGpfsStartup: event startup, Async command /usr/lpp/mmfs/bin/mmsysmoncontrol. Sat Sep 2 10:33:42.190 2017: [I] Calling user exit script mmSinceShutdownRoleChange: event startup, Async command /usr/lpp/mmfs/bin/mmsysmonc. Sat 2 Sep 10:33:42 BST 2017: [I] sendRasEventToMonitor: Successfully send a filesystem event to monitor Sat 2 Sep 10:33:42 BST 2017: [I] The Spectrum Scale monitoring service is already running. Pid=5134 Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From truongv at us.ibm.com Sat Sep 2 12:40:15 2017 From: truongv at us.ibm.com (Truong Vu) Date: Sat, 2 Sep 2017 07:40:15 -0400 Subject: [gpfsug-discuss] Date formats inconsistent mmfs.log In-Reply-To: References: Message-ID: The dates that have the zone abbreviation are from the scripts which use the OS date command. The daemon has its own format. This inconsistency has been address in 4.2.2. From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 09/02/2017 07:00 AM Subject: gpfsug-discuss Digest, Vol 68, Issue 4 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=wiPE5K_0qzTwdloCshNcSyamVNRJKz5WyOBal7dMz8w&s=pd3-zi8UQxVOjxOYxqbuaFSvv_71WENUBJsw0KUV3ro&e= or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Date formats inconsistent mmfs.log (Sobey, Richard A) ---------------------------------------------------------------------- Message: 1 Date: Sat, 2 Sep 2017 09:35:34 +0000 From: "Sobey, Richard A" To: "'gpfsug-discuss at spectrumscale.org'" Subject: [gpfsug-discuss] Date formats inconsistent mmfs.log Message-ID: Content-Type: text/plain; charset="us-ascii" Is there a good reason for the date formats in mmfs.log to be inconsistent? Apart from my OCD getting the better of me, it makes log analysis a bit difficult. Sat Sep 2 10:33:42.145 2017: [I] Command: successful mount gpfs Sat 2 Sep 10:33:42 BST 2017: finished mounting /dev/gpfs Sat Sep 2 10:33:42.168 2017: [I] Calling user exit script mmSysMonGpfsStartup: event startup, Async command /usr/lpp/mmfs/bin/mmsysmoncontrol. Sat Sep 2 10:33:42.190 2017: [I] Calling user exit script mmSinceShutdownRoleChange: event startup, Async command /usr/lpp/mmfs/bin/mmsysmonc. Sat 2 Sep 10:33:42 BST 2017: [I] sendRasEventToMonitor: Successfully send a filesystem event to monitor Sat 2 Sep 10:33:42 BST 2017: [I] The Spectrum Scale monitoring service is already running. Pid=5134 Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: < https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_pipermail_gpfsug-2Ddiscuss_attachments_20170902_4f65f336_attachment-2D0001.html&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=wiPE5K_0qzTwdloCshNcSyamVNRJKz5WyOBal7dMz8w&s=fNT71mM8obJ9rwxzm3Uzxw4mayi2pQg1u950E1raYK4&e= > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=wiPE5K_0qzTwdloCshNcSyamVNRJKz5WyOBal7dMz8w&s=pd3-zi8UQxVOjxOYxqbuaFSvv_71WENUBJsw0KUV3ro&e= End of gpfsug-discuss Digest, Vol 68, Issue 4 ********************************************* -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From john.hearns at asml.com Mon Sep 4 08:43:59 2017 From: john.hearns at asml.com (John Hearns) Date: Mon, 4 Sep 2017 07:43:59 +0000 Subject: [gpfsug-discuss] Date formats inconsistent mmfs.log In-Reply-To: References: Message-ID: Richard, The date format changed at an update level. We recently updated to 4.2.3 and when you run mmchconfig release=LATEST you are prompted to confirm that the new log format can be used. I guess you might not have cut all nodes over yet on your update over the weekend? Cut and paste from the documentation: mmfsLogTimeStampISO8601={yes | no} Setting this parameter to no allows the cluster to continue running with the earlier log time stamp format. For more information, see Security mode. * Set mmfsLogTimeStampISO8061 to no if you save log information and you are not yet ready to switch to the new log time stamp format. After you complete the migration, you can change the log time stamp format at any time with the mmchconfig command. * Omit this parameter if you are ready to switch to the new format. The default value is yes From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: Saturday, September 02, 2017 11:36 AM To: 'gpfsug-discuss at spectrumscale.org' Subject: [gpfsug-discuss] Date formats inconsistent mmfs.log Is there a good reason for the date formats in mmfs.log to be inconsistent? Apart from my OCD getting the better of me, it makes log analysis a bit difficult. Sat Sep 2 10:33:42.145 2017: [I] Command: successful mount gpfs Sat 2 Sep 10:33:42 BST 2017: finished mounting /dev/gpfs Sat Sep 2 10:33:42.168 2017: [I] Calling user exit script mmSysMonGpfsStartup: event startup, Async command /usr/lpp/mmfs/bin/mmsysmoncontrol. Sat Sep 2 10:33:42.190 2017: [I] Calling user exit script mmSinceShutdownRoleChange: event startup, Async command /usr/lpp/mmfs/bin/mmsysmonc. Sat 2 Sep 10:33:42 BST 2017: [I] sendRasEventToMonitor: Successfully send a filesystem event to monitor Sat 2 Sep 10:33:42 BST 2017: [I] The Spectrum Scale monitoring service is already running. Pid=5134 Cheers Richard -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Mon Sep 4 09:05:10 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Mon, 4 Sep 2017 08:05:10 +0000 Subject: [gpfsug-discuss] Date formats inconsistent mmfs.log In-Reply-To: References: , Message-ID: Ah. I'm running 4.2.3 but haven't changed the release level. I'll get that sorted out. Thanks for the replies! Get Outlook for Android ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of John Hearns Sent: Monday, September 4, 2017 8:43:59 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Date formats inconsistent mmfs.log Richard, The date format changed at an update level. We recently updated to 4.2.3 and when you run mmchconfig release=LATEST you are prompted to confirm that the new log format can be used. I guess you might not have cut all nodes over yet on your update over the weekend? Cut and paste from the documentation: mmfsLogTimeStampISO8601={yes | no} Setting this parameter to no allows the cluster to continue running with the earlier log time stamp format. For more information, see Security mode. ? Set mmfsLogTimeStampISO8061 to no if you save log information and you are not yet ready to switch to the new log time stamp format. After you complete the migration, you can change the log time stamp format at any time with the mmchconfig command. ? Omit this parameter if you are ready to switch to the new format. The default value is yes From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: Saturday, September 02, 2017 11:36 AM To: 'gpfsug-discuss at spectrumscale.org' Subject: [gpfsug-discuss] Date formats inconsistent mmfs.log Is there a good reason for the date formats in mmfs.log to be inconsistent? Apart from my OCD getting the better of me, it makes log analysis a bit difficult. Sat Sep 2 10:33:42.145 2017: [I] Command: successful mount gpfs Sat 2 Sep 10:33:42 BST 2017: finished mounting /dev/gpfs Sat Sep 2 10:33:42.168 2017: [I] Calling user exit script mmSysMonGpfsStartup: event startup, Async command /usr/lpp/mmfs/bin/mmsysmoncontrol. Sat Sep 2 10:33:42.190 2017: [I] Calling user exit script mmSinceShutdownRoleChange: event startup, Async command /usr/lpp/mmfs/bin/mmsysmonc. Sat 2 Sep 10:33:42 BST 2017: [I] sendRasEventToMonitor: Successfully send a filesystem event to monitor Sat 2 Sep 10:33:42 BST 2017: [I] The Spectrum Scale monitoring service is already running. Pid=5134 Cheers Richard -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ckrafft at de.ibm.com Mon Sep 4 13:02:49 2017 From: ckrafft at de.ibm.com (Christoph Krafft) Date: Mon, 4 Sep 2017 12:02:49 +0000 Subject: [gpfsug-discuss] Looking for Use-Cases with Spectrum Scale / ESS with vRanger & VMware Message-ID: An HTML attachment was scrubbed... URL: From heiner.billich at psi.ch Mon Sep 4 17:48:20 2017 From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI)) Date: Mon, 4 Sep 2017 16:48:20 +0000 Subject: [gpfsug-discuss] Use AFM for migration of many small files Message-ID: <467FB293-D33B-46F4-BA1B-A5CB4D28DDE6@psi.ch> Hello, We use AFM prefetch to migrate data between two clusters (using NFS). This works fine with large files, say 1+GB. But we have millions of smaller files, about 1MB each. Here I see just ~150MB/s ? compare this to the 1000+MB/s we get for larger files. I assume that we would need more parallelism, does prefetch pull just one file at a time? So each file needs some or many metadata operations plus a single or just a few read and writes. Doing this sequentially adds up all the latencies of NFS+GPFS. This is my explanation. With larger files gpfs prefetch on home will help. Please can anybody comment: Is this right, does AFM prefetch handle one file at a time in a sequential manner? And is there any way to change this behavior? Or am I wrong and I need to look elsewhere to get better performance for prefetch of many smaller files? We will migrate several filesets in parallel, but still with individual filesets up to 350TB in size 150MB/s isn?t fun. Also just about 150 files/s seconds looks poor. The setup is quite new, hence there may be other places to look at. It?s all RHEL7 an spectrum scale 4.2.2-3 on the afm cache. Thank you, Heiner --, Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch From vpuvvada at in.ibm.com Tue Sep 5 15:27:21 2017 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Tue, 5 Sep 2017 19:57:21 +0530 Subject: [gpfsug-discuss] Use AFM for migration of many small files In-Reply-To: <467FB293-D33B-46F4-BA1B-A5CB4D28DDE6@psi.ch> References: <467FB293-D33B-46F4-BA1B-A5CB4D28DDE6@psi.ch> Message-ID: Which version of Spectrum Scale ? What is the fileset mode ? >We use AFM prefetch to migrate data between two clusters (using NFS). This works fine with large files, say 1+GB. But we have millions of smaller files, about 1MB each. Here >I see just ~150MB/s ? compare this to the 1000+MB/s we get for larger files. How was the performance measured ? If parallel IO is enabled, AFM uses multiple gateway nodes to prefetch the large files (if file size if more than 1GB). Performance difference between small and lager file is huge (1000MB - 150MB = 850MB) here, and generally it is not the case. How many files were present in list file for prefetch ? Could you also share full internaldump from the gateway node ? >I assume that we would need more parallelism, does prefetch pull just one file at a time? So each file needs some or many metadata operations plus a single or just a few >read and writes. Doing this sequentially adds up all the latencies of NFS+GPFS. This is my explanation. With larger files gpfs prefetch on home will help. AFM prefetches the files on multiple threads. Default flush threads for prefetch are 36 (fileset.afmNumFlushThreads (default 4) + afmNumIOFlushThreads (default 32)). >Please can anybody comment: Is this right, does AFM prefetch handle one file at a time in a sequential manner? And is there any way to change this behavior? Or am I wrong and >I need to look elsewhere to get better performance for prefetch of many smaller files? See above, AFM reads files on multiple threads parallelly. Try increasing the afmNumFlushThreads on fileset and verify if it improves the performance. ~Venkat (vpuvvada at in.ibm.com) From: "Billich Heinrich Rainer (PSI)" To: "gpfsug-discuss at spectrumscale.org" Date: 09/04/2017 10:18 PM Subject: [gpfsug-discuss] Use AFM for migration of many small files Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello, We use AFM prefetch to migrate data between two clusters (using NFS). This works fine with large files, say 1+GB. But we have millions of smaller files, about 1MB each. Here I see just ~150MB/s ? compare this to the 1000+MB/s we get for larger files. I assume that we would need more parallelism, does prefetch pull just one file at a time? So each file needs some or many metadata operations plus a single or just a few read and writes. Doing this sequentially adds up all the latencies of NFS+GPFS. This is my explanation. With larger files gpfs prefetch on home will help. Please can anybody comment: Is this right, does AFM prefetch handle one file at a time in a sequential manner? And is there any way to change this behavior? Or am I wrong and I need to look elsewhere to get better performance for prefetch of many smaller files? We will migrate several filesets in parallel, but still with individual filesets up to 350TB in size 150MB/s isn?t fun. Also just about 150 files/s seconds looks poor. The setup is quite new, hence there may be other places to look at. It?s all RHEL7 an spectrum scale 4.2.2-3 on the afm cache. Thank you, Heiner --, Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://urldefense.proofpoint.com/v2/url?u=https-3A__www.psi.ch&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=4y79Y-3M5sHV1Fm6aUFPEDIl8W5jxVP64XSlBsAYBb4&s=eHcVdovN10-m-Qk0Ln2qvol3pkKNFwrzz2wgf1zXVXE&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=4y79Y-3M5sHV1Fm6aUFPEDIl8W5jxVP64XSlBsAYBb4&s=LbRyuSM_djs0FDXr27hPottQHAn3OGcivpyRcIDBN3U&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From kenneth.waegeman at ugent.be Wed Sep 6 12:55:20 2017 From: kenneth.waegeman at ugent.be (Kenneth Waegeman) Date: Wed, 6 Sep 2017 13:55:20 +0200 Subject: [gpfsug-discuss] Change to default for verbsRdmaMinBytes? In-Reply-To: References: <20170901165625.6e4edd4c@osc.edu> Message-ID: Hi Sven, I see two parameters that we have set to non-default values that are not in your list of options still to configure. verbsRdmasPerConnection (256) and socketMaxListenConnections (1024) I remember we had to set socketMaxListenConnections because our cluster consist of +550 nodes. Are these settings still needed, or is this also tackled in the code? Thank you!! Cheers, Kenneth On 02/09/17 00:42, Sven Oehme wrote: > Hi Ed, > > yes the defaults for that have changed for customers who had not > overridden the default settings. the reason we did this was that many > systems in the field including all ESS systems that come pre-tuned > where manually changed to 8k from the 16k default due to better > performance that was confirmed in multiple customer engagements and > tests with various settings , therefore we change the default to what > it should be in the field so people are not bothered to set it anymore > (simplification) or get benefits by changing the default to provides > better performance. > all this happened when we did the communication code overhaul that did > lead to significant (think factors) of improved RPC performance for > RDMA and VERBS workloads. > there is another round of significant enhancements coming soon , that > will make even more parameters either obsolete or change some of the > defaults for better out of the box performance. > i see that we should probably enhance the communication of this > changes, not that i think this will have any negative effect compared > to what your performance was with the old setting i am actually pretty > confident that you get better performance with the new code, but by > setting parameters back to default on most 'manual tuned' probably > makes your system even faster. > if you have a Scale Client on 4.2.3+ you really shouldn't have > anything set beside maxfilestocache, pagepool, workerthreads and > potential prefetch , if you are a protocol node, this and settings > specific to an export (e.g. SMB, NFS set some special settings) , > pretty much everything else these days should be set to default so the > code can pick the correct parameters., if its not and you get better > performance by manual tweaking something i like to hear about it. > on the communication side in the next release will eliminate another > set of parameters that are now 'auto set' and we plan to work on NSD > next. > i presented various slides about the communication and simplicity > changes in various forums, latest public non NDA slides i presented > are here --> > http://files.gpfsug.org/presentations/2017/Manchester/08_Research_Topics.pdf > > hope this helps . > > Sven > > > > On Fri, Sep 1, 2017 at 1:56 PM Edward Wahl > wrote: > > Howdy. Just noticed this change to min RDMA packet size and I > don't seem to > see it in any patch notes. Maybe I just skipped the one where > this changed? > > mmlsconfig verbsRdmaMinBytes > verbsRdmaMinBytes 16384 > > (in case someone thinks we changed it) > > [root at proj-nsd01 ~]# mmlsconfig |grep verbs > verbsRdma enable > verbsRdma disable > verbsRdmasPerConnection 14 > verbsRdmasPerNode 1024 > verbsPorts mlx5_3/1 > verbsPorts mlx4_0 > verbsPorts mlx5_0 > verbsPorts mlx5_0 mlx5_1 > verbsPorts mlx4_1/1 > verbsPorts mlx4_1/2 > > > Oddly I also see this in config, though I've seen these kinds of > things before. > mmdiag --config |grep verbsRdmaMinBytes > verbsRdmaMinBytes 8192 > > We're on a recent efix. > Current GPFS build: "4.2.2.3 efix21 (1028007)". > > -- > > Ed Wahl > Ohio Supercomputer Center > 614-292-9302 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Wed Sep 6 13:22:41 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Wed, 6 Sep 2017 14:22:41 +0200 Subject: [gpfsug-discuss] Change to default for verbsRdmaMinBytes? In-Reply-To: References: <20170901165625.6e4edd4c@osc.edu> Message-ID: An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Sep 6 13:29:44 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 6 Sep 2017 12:29:44 +0000 Subject: [gpfsug-discuss] Save the date! GPFS-UG meeting at SC17 - Sunday November 12th Message-ID: <7838054B-8A46-46A0-8A53-81E3049B4AE7@nuance.com> The 2017 Supercomputing conference is only 2 months away, and here?s a reminder to come early and attend the GPFS user group meeting. The meeting is tentatively scheduled from the afternoon of Sunday, November 12th. Exact location and times are still being discussed. If you have an interest in presenting at the user group meeting, please let us know. More details in the coming weeks. Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From damir.krstic at gmail.com Wed Sep 6 13:35:45 2017 From: damir.krstic at gmail.com (Damir Krstic) Date: Wed, 06 Sep 2017 12:35:45 +0000 Subject: [gpfsug-discuss] filesets inside of filesets Message-ID: Today we have following fileset structure on our filesystem: /projects <-- gpfs filesystem /projects/b1000 <-- b1000 is a fileset with a fileset quota applied to it I need to create a fileset or a directory inside of this project and have separate quota applied to it e.g.: /projects/b1000 (b1000 has 10TB quota applied) /projects/b1000/backup (backup has 1TB quota applied) Is this possible? I am thinking nested filesets would work if GPFS supports that. Otherwise, I was going to create a separate filesystem, create corresponding backup filesets on it and symlink them to the /projects/ directory. Thanks in advance. Damir -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Sep 6 13:43:09 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Wed, 6 Sep 2017 12:43:09 +0000 Subject: [gpfsug-discuss] filesets inside of filesets In-Reply-To: References: Message-ID: Filesets in filesets are fine. BUT if you use scoped backups with TSM... Er Spectrum Protect, then there are restrictions on creating an IFS inside an IFS ... Simon From: > on behalf of "damir.krstic at gmail.com" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Wednesday, 6 September 2017 at 13:35 To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] filesets inside of filesets Today we have following fileset structure on our filesystem: /projects <-- gpfs filesystem /projects/b1000 <-- b1000 is a fileset with a fileset quota applied to it I need to create a fileset or a directory inside of this project and have separate quota applied to it e.g.: /projects/b1000 (b1000 has 10TB quota applied) /projects/b1000/backup (backup has 1TB quota applied) Is this possible? I am thinking nested filesets would work if GPFS supports that. Otherwise, I was going to create a separate filesystem, create corresponding backup filesets on it and symlink them to the /projects/ directory. Thanks in advance. Damir -------------- next part -------------- An HTML attachment was scrubbed... URL: From rohwedder at de.ibm.com Wed Sep 6 13:51:47 2017 From: rohwedder at de.ibm.com (Markus Rohwedder) Date: Wed, 6 Sep 2017 14:51:47 +0200 Subject: [gpfsug-discuss] filesets inside of filesets In-Reply-To: References: Message-ID: Hello Damir, the files that belong to your fileset "backup" has a separate quota, it is not related to the quota in "b1000". There is no cumulative quota. Fileset Nesting may need other considerations as well, in some cases filesets behave different than simple directories. -> For NFSV4 ACLs, inheritance stops at the fileset boundaries -> Snapshots include the independent parent and the dependent children. Nested independent filesets are not included in a fileset snapshot. -> Export protocols like NFS or SMB will cross fileset boundaries and just treat them like a directory. Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development Phone: +49 7034 6430190 IBM Deutschland E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Martina K?deritz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: Damir Krstic To: gpfsug main discussion list Date: 09/06/2017 02:36 PM Subject: [gpfsug-discuss] filesets inside of filesets Sent by: gpfsug-discuss-bounces at spectrumscale.org Today we have following fileset structure on our filesystem: /projects <-- gpfs filesystem /projects/b1000 <-- b1000 is a fileset with a fileset quota applied to it I need to create a fileset or a directory inside of this project and have separate quota applied to it e.g.: /projects/b1000 (b1000 has 10TB quota applied) /projects/b1000/backup (backup has 1TB quota applied) Is this possible? I am thinking nested filesets would work if GPFS supports that. Otherwise, I was going to create a separate filesystem, create corresponding backup filesets on it and symlink them to the /projects/ directory. Thanks in advance. Damir_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=5jyA3TazAAOckIeQUeIG0CJ4TG0aMWv7jDLDk3gYNkE&s=CbzPKTgh7mO6om2LTQr94LM1qfshrEdm58cJydejAfE&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1B378274.gif Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From oehmes at gmail.com Wed Sep 6 14:32:40 2017 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 06 Sep 2017 13:32:40 +0000 Subject: [gpfsug-discuss] Change to default for verbsRdmaMinBytes? In-Reply-To: References: <20170901165625.6e4edd4c@osc.edu> Message-ID: Hi, you still need both of them, but they are both on the list to be removed, the first is already integrated for the next major release, the 2nd we still work on. Sven On Wed, Sep 6, 2017 at 4:55 AM Kenneth Waegeman wrote: > Hi Sven, > > I see two parameters that we have set to non-default values that are not > in your list of options still to configure. > verbsRdmasPerConnection (256) and > socketMaxListenConnections (1024) > > I remember we had to set socketMaxListenConnections because our cluster > consist of +550 nodes. > > Are these settings still needed, or is this also tackled in the code? > > Thank you!! > > Cheers, > Kenneth > > > > On 02/09/17 00:42, Sven Oehme wrote: > > Hi Ed, > > yes the defaults for that have changed for customers who had not > overridden the default settings. the reason we did this was that many > systems in the field including all ESS systems that come pre-tuned where > manually changed to 8k from the 16k default due to better performance that > was confirmed in multiple customer engagements and tests with various > settings , therefore we change the default to what it should be in the > field so people are not bothered to set it anymore (simplification) or get > benefits by changing the default to provides better performance. > all this happened when we did the communication code overhaul that did > lead to significant (think factors) of improved RPC performance for RDMA > and VERBS workloads. > there is another round of significant enhancements coming soon , that will > make even more parameters either obsolete or change some of the defaults > for better out of the box performance. > i see that we should probably enhance the communication of this changes, > not that i think this will have any negative effect compared to what your > performance was with the old setting i am actually pretty confident that > you get better performance with the new code, but by setting parameters > back to default on most 'manual tuned' probably makes your system even > faster. > if you have a Scale Client on 4.2.3+ you really shouldn't have anything > set beside maxfilestocache, pagepool, workerthreads and potential prefetch > , if you are a protocol node, this and settings specific to an export > (e.g. SMB, NFS set some special settings) , pretty much everything else > these days should be set to default so the code can pick the correct > parameters., if its not and you get better performance by manual tweaking > something i like to hear about it. > on the communication side in the next release will eliminate another set > of parameters that are now 'auto set' and we plan to work on NSD next. > i presented various slides about the communication and simplicity changes > in various forums, latest public non NDA slides i presented are here --> > http://files.gpfsug.org/presentations/2017/Manchester/08_Research_Topics.pdf > > hope this helps . > > Sven > > > > On Fri, Sep 1, 2017 at 1:56 PM Edward Wahl < ewahl at osc.edu> > wrote: > >> Howdy. Just noticed this change to min RDMA packet size and I don't >> seem to >> see it in any patch notes. Maybe I just skipped the one where this >> changed? >> >> mmlsconfig verbsRdmaMinBytes >> verbsRdmaMinBytes 16384 >> >> (in case someone thinks we changed it) >> >> [root at proj-nsd01 ~]# mmlsconfig |grep verbs >> verbsRdma enable >> verbsRdma disable >> verbsRdmasPerConnection 14 >> verbsRdmasPerNode 1024 >> verbsPorts mlx5_3/1 >> verbsPorts mlx4_0 >> verbsPorts mlx5_0 >> verbsPorts mlx5_0 mlx5_1 >> verbsPorts mlx4_1/1 >> verbsPorts mlx4_1/2 >> >> >> Oddly I also see this in config, though I've seen these kinds of things >> before. >> mmdiag --config |grep verbsRdmaMinBytes >> verbsRdmaMinBytes 8192 >> >> We're on a recent efix. >> Current GPFS build: "4.2.2.3 efix21 (1028007)". >> >> -- >> >> Ed Wahl >> Ohio Supercomputer Center >> 614-292-9302 <%28614%29%20292-9302> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From heiner.billich at psi.ch Wed Sep 6 17:16:18 2017 From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI)) Date: Wed, 6 Sep 2017 16:16:18 +0000 Subject: [gpfsug-discuss] Use AFM for migration of many small files Message-ID: <7D6EFD03-5D74-4A7B-A0E8-2AD41B050E15@psi.ch> Hello Venkateswara, Edward, Thank you for the comments on how to speed up AFM prefetch with small files. We run 4.2.2-3 and the AFM mode is RO and we have just a single gateway, i.e. no parallel reads for large files. We will try to increase the value of afmNumFlushThreads. It wasn?t clear to me that these threads do read from home, too - at least for prefetch. First I will try a plain NFS mount and see how parallel reads of many small files scale the throughput. Next I will try AFM prefetch. I don?t do nice benchmarking, just watching dstat output. We prefetch 100?000 files in one bunch, so there is ample time to observe. The basic issue is that we get just about 45MB/s for sequential read of many 1000 files with 1MB per file on the home cluster. I.e. we read one file at a time before we switch to the next. This is no surprise. Each read takes about 20ms to complete, so at max we get 50 reads of 1MB per second. We?ve seen this on classical raid storage and on DSS/ESS systems. It?s likely just the physics of spinning disks and the fact that we do one read at a time and don?t allow any parallelism. We wait for one or two I/Os to single disks to complete before we continue With larger files prefetch jumps in and fires many reads in parallel ? To get 1?000MB/s I need to do 1?000 read/s and need to have ~20 reads in progress in parallel all the time ? we?ll see how close we get to 1?000MB/s with ?many small files?. Kind regards, Heiner -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch From stijn.deweirdt at ugent.be Wed Sep 6 18:13:48 2017 From: stijn.deweirdt at ugent.be (Stijn De Weirdt) Date: Wed, 6 Sep 2017 19:13:48 +0200 Subject: [gpfsug-discuss] mixed verbsRdmaSend Message-ID: hi all, what is the expected behaviour of a mixed verbsRdmaSend setup: some nodes enabled, most disabled. we have some nodes that have a very high iops workload, but most of the cluster of 500+ nodes do not have such usecase. we enabled verbsRdmaSend on the managers/quorum nodes (<10) and on the few (<10) clients with this workload, but not on the others (500+). it seems to work out fine, but is this acceptable as config? (the docs mention that enabling verbsrdamSend on a> 100 nodes might lead to errors). the nodes use ipoib as ip network, and running with verbsRdmaSend disabled on all nodes leads to unstable cluster (TX errors (<1 error in 1M packets) on some clients leading to gpfs expel nodes etc). (we still need to open a case wil mellanox to investigate further) many thanks, stijn From gcorneau at us.ibm.com Thu Sep 7 00:30:23 2017 From: gcorneau at us.ibm.com (Glen Corneau) Date: Wed, 6 Sep 2017 18:30:23 -0500 Subject: [gpfsug-discuss] Happy 20th birthday GPFS !! Message-ID: Sorry I missed the anniversary of your conception (announcement letter) back on August 26th, so I hope you'll accept my belated congratulations on this long and exciting journey! https://www-01.ibm.com/common/ssi/cgi-bin/ssialias?subtype=ca&infotype=an&appname=iSource&supplier=897&letternum=ENUS297-318 I remember your parent, PIOFS, as well! Ahh the fun times. ------------------ Glen Corneau Power Systems Washington Systems Center gcorneau at us.ibm.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 26117 bytes Desc: not available URL: From xhejtman at ics.muni.cz Thu Sep 7 16:07:20 2017 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Thu, 7 Sep 2017 17:07:20 +0200 Subject: [gpfsug-discuss] Overwritting migrated files Message-ID: <20170907150720.h3t5fowvdlibvik4@ics.muni.cz> Hello, we have files about 100GB per file. Many of these files are migrated to tapes. (GPFS+TSM, tape storage is external pool and dsmmigrate, dsmrecall are in place). These files are images from bacula backup system. When bacula wants to reuse some of images, it needs to truncate the file to 64kB and overwrite it. Is there a way not to recall whole 100GB from tapes for only to truncate the file? I tried to do partial recall: dsmrecall -D -size=65k Vol03797 after recall processing finished, I tried to truncate the file using: dd if=/dev/zero of=Vol03797 count=0 bs=64k seek=1 which caused futher recall of the whole file: $ dsmls Vol03797 IBM Spectrum Protect Command Line Space Management Client Interface Client Version 8, Release 1, Level 2.0 Client date/time: 09/07/2017 15:01:59 (c) Copyright by IBM Corporation and other(s) 1990, 2017. All Rights Reserved. ActS ResS ResB FSt FName 107380819676 10485760 31373312 m (p) Vol03797 and ResB size has been growing to 107380819676. After dd finished: dsmls Vol03797 IBM Spectrum Protect Command Line Space Management Client Interface Client Version 8, Release 1, Level 2.0 Client date/time: 09/07/2017 15:08:03 (c) Copyright by IBM Corporation and other(s) 1990, 2017. All Rights Reserved. ActS ResS ResB FSt FName 65536 65536 64 r Vol03797 Is there another way to truncate the file and drop whole migrated part? -- Luk?? Hejtm?nek From john.hearns at asml.com Thu Sep 7 16:15:00 2017 From: john.hearns at asml.com (John Hearns) Date: Thu, 7 Sep 2017 15:15:00 +0000 Subject: [gpfsug-discuss] AFM from generic NFS share - mmafmconfig Message-ID: If I have an AFM setup where the home is located on a generic NFS share, let's say server:/volume/share When I come ot set this up does it make sense to run mmafmconfig on /volume/share ? I can mount the share as a plain old NFS mount in order to run this operation, before I create the cache side fileset. Apologies if I am being dumb as an ox here, and I deserve to be slapped in the face with a wet fish https://en.wikipedia.org/wiki/The_Fish-Slapping_Dance -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From neil.wilson at metoffice.gov.uk Thu Sep 7 16:33:58 2017 From: neil.wilson at metoffice.gov.uk (Wilson, Neil) Date: Thu, 7 Sep 2017 15:33:58 +0000 Subject: [gpfsug-discuss] AFM from generic NFS share - mmafmconfig In-Reply-To: References: Message-ID: I think you need to configure a gateway node (use mmchnode to change an existing node class to gateway) Then use mmafmconfig to setup export server maps on the gateway node. e.g. mmafmconfig -add "mapping1" -export-map "nfsServerIP"/"GatewayNode" (double quotes not required) mafmconfig show all Map name: mapping1 Export server map: IP/GatewayNode From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of John Hearns Sent: 07 September 2017 16:15 To: gpfsug main discussion list Subject: [gpfsug-discuss] AFM from generic NFS share - mmafmconfig If I have an AFM setup where the home is located on a generic NFS share, let's say server:/volume/share When I come ot set this up does it make sense to run mmafmconfig on /volume/share ? I can mount the share as a plain old NFS mount in order to run this operation, before I create the cache side fileset. Apologies if I am being dumb as an ox here, and I deserve to be slapped in the face with a wet fish https://en.wikipedia.org/wiki/The_Fish-Slapping_Dance -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.hearns at asml.com Thu Sep 7 16:52:19 2017 From: john.hearns at asml.com (John Hearns) Date: Thu, 7 Sep 2017 15:52:19 +0000 Subject: [gpfsug-discuss] Deletion of a pcache snapshot? Message-ID: Firmly lining myself up for a smack round the chops with a wet haddock... I try to delete an AFM cache fileset which I create da few days ago (it has an NFS home) Mmdelfileset responds that : Fileset obfuscated has 1 fileset snapshot(s). When I try to delete the snapshot: Snapshot obfuscated.afm.1234 is an internal pcache recovery snapshot and cannot be deleted by user. I find this reference, which is about as useful as a wet haddock: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm The advice of the gallery is sought, please. -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From janusz.malka at desy.de Thu Sep 7 20:23:36 2017 From: janusz.malka at desy.de (Malka, Janusz) Date: Thu, 7 Sep 2017 21:23:36 +0200 (CEST) Subject: [gpfsug-discuss] Deletion of a pcache snapshot? In-Reply-To: References: Message-ID: <950421843.15715748.1504812216286.JavaMail.zimbra@desy.de> I had similar issue, I had to recover connection to home From: "John Hearns" To: "gpfsug main discussion list" Sent: Thursday, 7 September, 2017 17:52:19 Subject: [gpfsug-discuss] Deletion of a pcache snapshot? Firmly lining myself up for a smack round the chops with a wet haddock? I try to delete an AFM cache fileset which I create da few days ago (it has an NFS home) Mmdelfileset responds that : Fileset obfuscated has 1 fileset snapshot(s). When I try to delete the snapshot: Snapshot obfuscated.afm.1234 is an internal pcache recovery snapshot and cannot be deleted by user. I find this reference, which is about as useful as a wet haddock: [ https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm | https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm ] The advice of the gallery is sought, please. -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From christof.schmitt at us.ibm.com Thu Sep 7 22:16:34 2017 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Thu, 7 Sep 2017 21:16:34 +0000 Subject: [gpfsug-discuss] SMB2 leases - oplocks - growing files In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Fri Sep 8 03:11:48 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Thu, 7 Sep 2017 22:11:48 -0400 Subject: [gpfsug-discuss] mmfsd write behavior Message-ID: <0f61621f-84d9-e249-0dd7-c1a4d50fea86@nasa.gov> Hi Everyone, This is something that's come up in the past and has recently resurfaced with a project I've been working on, and that is-- it seems to me as though mmfsd never attempts to flush the cache of the block devices its writing to (looking at blktrace output seems to confirm this). Is this actually the case? I've looked at the gpl headers for linux and I don't see any sign of blkdev_fsync, blkdev_issue_flush, WRITE_FLUSH, or REQ_FLUSH. I'm sure there's other ways to trigger this behavior that GPFS may very well be using that I've missed. That's why I'm asking :) I figure with FPO being pushed as an HDFS replacement using commodity drives this feature has *got* to be in the code somewhere. -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From oehmes at gmail.com Fri Sep 8 03:55:14 2017 From: oehmes at gmail.com (Sven Oehme) Date: Fri, 08 Sep 2017 02:55:14 +0000 Subject: [gpfsug-discuss] mmfsd write behavior In-Reply-To: <0f61621f-84d9-e249-0dd7-c1a4d50fea86@nasa.gov> References: <0f61621f-84d9-e249-0dd7-c1a4d50fea86@nasa.gov> Message-ID: I am not sure what exactly you are looking for but all blockdevices are opened with O_DIRECT , we never cache anything on this layer . On Thu, Sep 7, 2017, 7:11 PM Aaron Knister wrote: > Hi Everyone, > > This is something that's come up in the past and has recently resurfaced > with a project I've been working on, and that is-- it seems to me as > though mmfsd never attempts to flush the cache of the block devices its > writing to (looking at blktrace output seems to confirm this). Is this > actually the case? I've looked at the gpl headers for linux and I don't > see any sign of blkdev_fsync, blkdev_issue_flush, WRITE_FLUSH, or > REQ_FLUSH. I'm sure there's other ways to trigger this behavior that > GPFS may very well be using that I've missed. That's why I'm asking :) > > I figure with FPO being pushed as an HDFS replacement using commodity > drives this feature has *got* to be in the code somewhere. > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Fri Sep 8 04:05:42 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Thu, 7 Sep 2017 23:05:42 -0400 Subject: [gpfsug-discuss] mmfsd write behavior In-Reply-To: References: <0f61621f-84d9-e249-0dd7-c1a4d50fea86@nasa.gov> Message-ID: Thanks Sven. I didn't think GPFS itself was caching anything on that layer, but it's my understanding that O_DIRECT isn't sufficient to force I/O to be flushed (e.g. the device itself might have a volatile caching layer). Take someone using ZFS zvol's as NSDs. I can write() all day log to that zvol (even with O_DIRECT) but there is absolutely no guarantee those writes have been committed to stable storage and aren't just sitting in RAM until an fsync() occurs (or some other bio function that causes a flush). I also don't believe writing to a SATA drive with O_DIRECT will force cache flushes of the drive's writeback cache.. although I just tested that one and it seems to actually trigger a scsi cache sync. Interesting. -Aaron On 9/7/17 10:55 PM, Sven Oehme wrote: > I am not sure what exactly you are looking for but all blockdevices are > opened with O_DIRECT , we never cache anything on this layer . > > > On Thu, Sep 7, 2017, 7:11 PM Aaron Knister > wrote: > > Hi Everyone, > > This is something that's come up in the past and has recently resurfaced > with a project I've been working on, and that is-- it seems to me as > though mmfsd never attempts to flush the cache of the block devices its > writing to (looking at blktrace output seems to confirm this). Is this > actually the case? I've looked at the gpl headers for linux and I don't > see any sign of blkdev_fsync, blkdev_issue_flush, WRITE_FLUSH, or > REQ_FLUSH. I'm sure there's other ways to trigger this behavior that > GPFS may very well be using that I've missed. That's why I'm asking :) > > I figure with FPO being pushed as an HDFS replacement using commodity > drives this feature has *got* to be in the code somewhere. > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From aaron.s.knister at nasa.gov Fri Sep 8 04:26:02 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Thu, 7 Sep 2017 23:26:02 -0400 Subject: [gpfsug-discuss] Happy 20th birthday GPFS !! In-Reply-To: References: Message-ID: <4a9feeb2-bb9d-8c9a-e506-926d8537cada@nasa.gov> Sounds like celebratory cake is in order for the users group in a few weeks ;) On 9/6/17 7:30 PM, Glen Corneau wrote: > Sorry I missed the anniversary of your conception ?(announcement letter) > back on August 26th, so I hope you'll accept my belated congratulations > on this long and exciting journey! > > https://www-01.ibm.com/common/ssi/cgi-bin/ssialias?subtype=ca&infotype=an&appname=iSource&supplier=897&letternum=ENUS297-318 > > I remember your parent, PIOFS, as well! ?Ahh the fun times. > ------------------ > Glen Corneau > Power Systems > Washington Systems Center > gcorneau at us.ibm.com > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From vpuvvada at in.ibm.com Fri Sep 8 06:00:46 2017 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Fri, 8 Sep 2017 10:30:46 +0530 Subject: [gpfsug-discuss] Deletion of a pcache snapshot? In-Reply-To: <950421843.15715748.1504812216286.JavaMail.zimbra@desy.de> References: <950421843.15715748.1504812216286.JavaMail.zimbra@desy.de> Message-ID: Snapshots created by AFM (recovery, resync or peer snapshots) cannot be deleted by user using mmdelsnapshot command directly. After recovery or resync completion they get deleted automatically. For peer snapshots deletion mmpsnasp command is used. Which version of GPFS? Try with -p (undocumented) option. mmdelsnapshot device snapname -j fset -p ~Venkat (vpuvvada at in.ibm.com) From: "Malka, Janusz" To: gpfsug main discussion list Date: 09/08/2017 12:53 AM Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? Sent by: gpfsug-discuss-bounces at spectrumscale.org I had similar issue, I had to recover connection to home From: "John Hearns" To: "gpfsug main discussion list" Sent: Thursday, 7 September, 2017 17:52:19 Subject: [gpfsug-discuss] Deletion of a pcache snapshot? Firmly lining myself up for a smack round the chops with a wet haddock? I try to delete an AFM cache fileset which I create da few days ago (it has an NFS home) Mmdelfileset responds that : Fileset obfuscated has 1 fileset snapshot(s). When I try to delete the snapshot: Snapshot obfuscated.afm.1234 is an internal pcache recovery snapshot and cannot be deleted by user. I find this reference, which is about as useful as a wet haddock: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm The advice of the gallery is sought, please. -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=VX-Y-5GtodMzrpt1D71-1OOY3hu2UTJBg45sqxTHC8I&s=AmQf6iZeaanuNkB3Ys2lR8Ajk-n2TUJ6Wbt2z2pnbKI&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From vpuvvada at in.ibm.com Fri Sep 8 06:21:47 2017 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Fri, 8 Sep 2017 10:51:47 +0530 Subject: [gpfsug-discuss] AFM from generic NFS share - mmafmconfig In-Reply-To: References: Message-ID: mmafmconfig command should be run on the target path (path specified in the afmTarget option when fileset is created). If many filesets are sharing the same target (ex independent writer mode) , enable AFM once on target path. Run the command at home cluster. mmafmconifg enable afmTarget ~Venkat (vpuvvada at in.ibm.com) From: "Wilson, Neil" To: gpfsug main discussion list Date: 09/07/2017 09:04 PM Subject: Re: [gpfsug-discuss] AFM from generic NFS share - mmafmconfig Sent by: gpfsug-discuss-bounces at spectrumscale.org I think you need to configure a gateway node (use mmchnode to change an existing node class to gateway) Then use mmafmconfig to setup export server maps on the gateway node. e.g. mmafmconfig ?add ?mapping1? ?export-map ?nfsServerIP?/?GatewayNode? (double quotes not required) mafmconfig show all Map name: mapping1 Export server map: IP/GatewayNode From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of John Hearns Sent: 07 September 2017 16:15 To: gpfsug main discussion list Subject: [gpfsug-discuss] AFM from generic NFS share - mmafmconfig If I have an AFM setup where the home is located on a generic NFS share, let?s say server:/volume/share When I come ot set this up does it make sense to run mmafmconfig on /volume/share ? I can mount the share as a plain old NFS mount in order to run this operation, before I create the cache side fileset. Apologies if I am being dumb as an ox here, and I deserve to be slapped in the face with a wet fish https://en.wikipedia.org/wiki/The_Fish-Slapping_Dance -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=kKlSEJqmVE6q8Qt02JNaDLsewp13C0yRAmlfc_djRkk&s=JIbuXlCiReZx3ws5__6juuGC-sAqM74296BuyzgyNYg&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From gellis at ocf.co.uk Fri Sep 8 08:04:51 2017 From: gellis at ocf.co.uk (Georgina Ellis) Date: Fri, 8 Sep 2017 07:04:51 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 68, Issue 13 In-Reply-To: References: Message-ID: <0CBB283A-A0A9-4FC9-A1CD-9E019D74CDB9@ocf.co.uk> I am still populating your lot 2 response - it is split across 3 word docs and a whole heap of emails so easier for me to keep going - I dropped u off a lot of emails to save filling your inbox :-) Could you poke around other tenders for the portal question please? Sent from my iPhone > On 7 Sep 2017, at 22:16, "gpfsug-discuss-request at spectrumscale.org" wrote: > > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at spectrumscale.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at spectrumscale.org > > You can reach the person managing the list at > gpfsug-discuss-owner at spectrumscale.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Re: Deletion of a pcache snapshot? (Malka, Janusz) > 2. Re: SMB2 leases - oplocks - growing files (Christof Schmitt) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 7 Sep 2017 21:23:36 +0200 (CEST) > From: "Malka, Janusz" > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? > Message-ID: <950421843.15715748.1504812216286.JavaMail.zimbra at desy.de> > Content-Type: text/plain; charset="utf-8" > > I had similar issue, I had to recover connection to home > > > From: "John Hearns" > To: "gpfsug main discussion list" > Sent: Thursday, 7 September, 2017 17:52:19 > Subject: [gpfsug-discuss] Deletion of a pcache snapshot? > > > > Firmly lining myself up for a smack round the chops with a wet haddock? > > I try to delete an AFM cache fileset which I create da few days ago (it has an NFS home) > > Mmdelfileset responds that : > > Fileset obfuscated has 1 fileset snapshot(s). > > > > When I try to delete the snapshot: > > Snapshot obfuscated.afm.1234 is an internal pcache recovery snapshot and cannot be deleted by user. > > > > I find this reference, which is about as useful as a wet haddock: > > [ https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm | https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm ] > > > > The advice of the gallery is sought, please. > > > > > > > -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > Message: 2 > Date: Thu, 7 Sep 2017 21:16:34 +0000 > From: "Christof Schmitt" > To: gpfsug-discuss at spectrumscale.org > Cc: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] SMB2 leases - oplocks - growing files > Message-ID: > > > Content-Type: text/plain; charset="us-ascii" > > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 68, Issue 13 > ********************************************** From john.hearns at asml.com Fri Sep 8 08:26:01 2017 From: john.hearns at asml.com (John Hearns) Date: Fri, 8 Sep 2017 07:26:01 +0000 Subject: [gpfsug-discuss] Deletion of a pcache snapshot? In-Reply-To: References: <950421843.15715748.1504812216286.JavaMail.zimbra@desy.de> Message-ID: Venkat, thankyou. I have a support ticket open on this issue, and will keep this option handy! From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Venkateswara R Puvvada Sent: Friday, September 08, 2017 7:01 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? Snapshots created by AFM (recovery, resync or peer snapshots) cannot be deleted by user using mmdelsnapshot command directly. After recovery or resync completion they get deleted automatically. For peer snapshots deletion mmpsnasp command is used. Which version of GPFS? Try with -p (undocumented) option. mmdelsnapshot device snapname -j fset -p ~Venkat (vpuvvada at in.ibm.com) From: "Malka, Janusz" > To: gpfsug main discussion list > Date: 09/08/2017 12:53 AM Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I had similar issue, I had to recover connection to home ________________________________ From: "John Hearns" > To: "gpfsug main discussion list" > Sent: Thursday, 7 September, 2017 17:52:19 Subject: [gpfsug-discuss] Deletion of a pcache snapshot? Firmly lining myself up for a smack round the chops with a wet haddock? I try to delete an AFM cache fileset which I create da few days ago (it has an NFS home) Mmdelfileset responds that : Fileset obfuscated has 1 fileset snapshot(s). When I try to delete the snapshot: Snapshot obfuscated.afm.1234 is an internal pcache recovery snapshot and cannot be deleted by user. I find this reference, which is about as useful as a wet haddock: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm The advice of the gallery is sought, please. -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=VX-Y-5GtodMzrpt1D71-1OOY3hu2UTJBg45sqxTHC8I&s=AmQf6iZeaanuNkB3Ys2lR8Ajk-n2TUJ6Wbt2z2pnbKI&e= -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gellis at ocf.co.uk Fri Sep 8 08:33:51 2017 From: gellis at ocf.co.uk (Georgina Ellis) Date: Fri, 8 Sep 2017 07:33:51 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 68, Issue 13 In-Reply-To: References: Message-ID: <93DCF805-F703-4ED5-A079-A44992A9268C@ocf.co.uk> Apologies All, slip of the keyboard and not a comment on GPFS! Sent from my iPhone > On 7 Sep 2017, at 22:16, "gpfsug-discuss-request at spectrumscale.org" wrote: > > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at spectrumscale.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at spectrumscale.org > > You can reach the person managing the list at > gpfsug-discuss-owner at spectrumscale.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Re: Deletion of a pcache snapshot? (Malka, Janusz) > 2. Re: SMB2 leases - oplocks - growing files (Christof Schmitt) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 7 Sep 2017 21:23:36 +0200 (CEST) > From: "Malka, Janusz" > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? > Message-ID: <950421843.15715748.1504812216286.JavaMail.zimbra at desy.de> > Content-Type: text/plain; charset="utf-8" > > I had similar issue, I had to recover connection to home > > > From: "John Hearns" > To: "gpfsug main discussion list" > Sent: Thursday, 7 September, 2017 17:52:19 > Subject: [gpfsug-discuss] Deletion of a pcache snapshot? > > > > Firmly lining myself up for a smack round the chops with a wet haddock? > > I try to delete an AFM cache fileset which I create da few days ago (it has an NFS home) > > Mmdelfileset responds that : > > Fileset obfuscated has 1 fileset snapshot(s). > > > > When I try to delete the snapshot: > > Snapshot obfuscated.afm.1234 is an internal pcache recovery snapshot and cannot be deleted by user. > > > > I find this reference, which is about as useful as a wet haddock: > > [ https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm | https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm ] > > > > The advice of the gallery is sought, please. > > > > > > > -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > Message: 2 > Date: Thu, 7 Sep 2017 21:16:34 +0000 > From: "Christof Schmitt" > To: gpfsug-discuss at spectrumscale.org > Cc: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] SMB2 leases - oplocks - growing files > Message-ID: > > > Content-Type: text/plain; charset="us-ascii" > > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 68, Issue 13 > ********************************************** From Sandra.McLaughlin at astrazeneca.com Fri Sep 8 10:12:02 2017 From: Sandra.McLaughlin at astrazeneca.com (McLaughlin, Sandra M) Date: Fri, 8 Sep 2017 09:12:02 +0000 Subject: [gpfsug-discuss] Deletion of a pcache snapshot? In-Reply-To: References: <950421843.15715748.1504812216286.JavaMail.zimbra@desy.de> Message-ID: John, I had a period when I had to delete and remake AFM filesets rather frequently ? this always worked for me: mmunlinkfileset device fset -f mmdelsnapshot device snapname -j fset mmdelfileset device fset -f Sandra From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of John Hearns Sent: 08 September 2017 08:26 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? Venkat, thankyou. I have a support ticket open on this issue, and will keep this option handy! From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Venkateswara R Puvvada Sent: Friday, September 08, 2017 7:01 AM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? Snapshots created by AFM (recovery, resync or peer snapshots) cannot be deleted by user using mmdelsnapshot command directly. After recovery or resync completion they get deleted automatically. For peer snapshots deletion mmpsnasp command is used. Which version of GPFS? Try with -p (undocumented) option. mmdelsnapshot device snapname -j fset -p ~Venkat (vpuvvada at in.ibm.com) From: "Malka, Janusz" > To: gpfsug main discussion list > Date: 09/08/2017 12:53 AM Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I had similar issue, I had to recover connection to home ________________________________ From: "John Hearns" > To: "gpfsug main discussion list" > Sent: Thursday, 7 September, 2017 17:52:19 Subject: [gpfsug-discuss] Deletion of a pcache snapshot? Firmly lining myself up for a smack round the chops with a wet haddock? I try to delete an AFM cache fileset which I create da few days ago (it has an NFS home) Mmdelfileset responds that : Fileset obfuscated has 1 fileset snapshot(s). When I try to delete the snapshot: Snapshot obfuscated.afm.1234 is an internal pcache recovery snapshot and cannot be deleted by user. I find this reference, which is about as useful as a wet haddock: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm The advice of the gallery is sought, please. -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=VX-Y-5GtodMzrpt1D71-1OOY3hu2UTJBg45sqxTHC8I&s=AmQf6iZeaanuNkB3Ys2lR8Ajk-n2TUJ6Wbt2z2pnbKI&e= -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. ________________________________ AstraZeneca UK Limited is a company incorporated in England and Wales with registered number:03674842 and its registered office at 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA. This e-mail and its attachments are intended for the above named recipient only and may contain confidential and privileged information. If they have come to you in error, you must not copy or show them to anyone; instead, please reply to this e-mail, highlighting the error to the sender and then immediately delete the message. For information about how AstraZeneca UK Limited and its affiliates may process information, personal data and monitor communications, please see our privacy notice at www.astrazeneca.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.hearns at asml.com Fri Sep 8 11:57:14 2017 From: john.hearns at asml.com (John Hearns) Date: Fri, 8 Sep 2017 10:57:14 +0000 Subject: [gpfsug-discuss] Deletion of a pcache snapshot? In-Reply-To: References: <950421843.15715748.1504812216286.JavaMail.zimbra@desy.de> Message-ID: Sandra, Thankyou for the help. I have a support ticket outstanding, and will see what is suggested. I am sure this is a simple matter of deleting the fileset as you say! From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of McLaughlin, Sandra M Sent: Friday, September 08, 2017 11:12 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? John, I had a period when I had to delete and remake AFM filesets rather frequently ? this always worked for me: mmunlinkfileset device fset -f mmdelsnapshot device snapname -j fset mmdelfileset device fset -f Sandra From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of John Hearns Sent: 08 September 2017 08:26 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? Venkat, thankyou. I have a support ticket open on this issue, and will keep this option handy! From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Venkateswara R Puvvada Sent: Friday, September 08, 2017 7:01 AM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? Snapshots created by AFM (recovery, resync or peer snapshots) cannot be deleted by user using mmdelsnapshot command directly. After recovery or resync completion they get deleted automatically. For peer snapshots deletion mmpsnasp command is used. Which version of GPFS? Try with -p (undocumented) option. mmdelsnapshot device snapname -j fset -p ~Venkat (vpuvvada at in.ibm.com) From: "Malka, Janusz" > To: gpfsug main discussion list > Date: 09/08/2017 12:53 AM Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I had similar issue, I had to recover connection to home ________________________________ From: "John Hearns" > To: "gpfsug main discussion list" > Sent: Thursday, 7 September, 2017 17:52:19 Subject: [gpfsug-discuss] Deletion of a pcache snapshot? Firmly lining myself up for a smack round the chops with a wet haddock? I try to delete an AFM cache fileset which I create da few days ago (it has an NFS home) Mmdelfileset responds that : Fileset obfuscated has 1 fileset snapshot(s). When I try to delete the snapshot: Snapshot obfuscated.afm.1234 is an internal pcache recovery snapshot and cannot be deleted by user. I find this reference, which is about as useful as a wet haddock: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm The advice of the gallery is sought, please. -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=VX-Y-5GtodMzrpt1D71-1OOY3hu2UTJBg45sqxTHC8I&s=AmQf6iZeaanuNkB3Ys2lR8Ajk-n2TUJ6Wbt2z2pnbKI&e= -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. ________________________________ AstraZeneca UK Limited is a company incorporated in England and Wales with registered number:03674842 and its registered office at 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA. This e-mail and its attachments are intended for the above named recipient only and may contain confidential and privileged information. If they have come to you in error, you must not copy or show them to anyone; instead, please reply to this e-mail, highlighting the error to the sender and then immediately delete the message. For information about how AstraZeneca UK Limited and its affiliates may process information, personal data and monitor communications, please see our privacy notice at www.astrazeneca.com -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Fri Sep 8 11:58:05 2017 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Fri, 8 Sep 2017 03:58:05 -0700 Subject: [gpfsug-discuss] Hold the Date - Spectrum Scale Day @ HPCXXL (Sept 2017, NYC) In-Reply-To: <6EF4187F-D8A1-4927-9E4F-4DF703DA04F5@lbl.gov> References: <6EF4187F-D8A1-4927-9E4F-4DF703DA04F5@lbl.gov> Message-ID: Hello, The agenda for the GPFS Day during HPCXXL is fairly fleshed out here: http://hpcxxl.org/summer-2017-meeting-september-24-29-new-york-city/ See notes on registration below, which is free but required. Use the HPCXXL registration form, which has a $0 GPFS Day registration option. Hope to see some of you there. Best, Kristy > On Aug 21, 2017, at 3:33 PM, Kristy Kallback-Rose wrote: > > If you plan on attending the GPFS Day, please use the HPCXXL registration form (link to Eventbrite registration at the link below). The GPFS day is a free event, but you *must* register so we can make sure there are enough seats and food available. > > If you would like to speak or suggest a topic, please let me know. > > http://hpcxxl.org/summer-2017-meeting-september-24-29-new-york-city/ > > The agenda is still being worked on, here are some likely topics: > > --RoadMap/Updates > --"New features - New Bugs? (Julich) > --GPFS + Openstack (CSCS) > --ORNL Update on Spider3-related GPFS work > --ANL Site Update > --File Corruption Session > > Best, > Kristy > >> On Aug 8, 2017, at 11:33 AM, Kristy Kallback-Rose > wrote: >> >> Hello, >> >> The GPFS Day of the HPCXXL conference is confirmed for Thursday, September 28th. Here is an updated URL, the agenda and registration are still being put together http://hpcxxl.org/summer-2017-meeting-september-24-29-new-york-city/ . The GPFS Day will require registration, so we can make sure there is enough room (and coffee/food) for all attendees ?however, there will be no registration fee if you attend the GPFS Day only. >> >> I?ll send another update when the agenda is closer to settled. >> >> Cheers, >> Kristy >> >>> On Jul 7, 2017, at 3:32 PM, Kristy Kallback-Rose > wrote: >>> >>> Hello, >>> >>> More details will be provided as they become available, but just so you can make a placeholder on your calendar, there will be a Spectrum Scale Day the week of September 25th - 29th, likely Thursday, September 28th. >>> >>> This will be a part of the larger HPCXXL meeting (https://www.spxxl.org/?q=New-York-City-2017 ). You may recall this group was formerly called SPXXL and the website is in the process of transitioning to the new name (and getting a new certificate). You will be able to attend *just* the Spectrum Scale day if that is the only portion of the event you would like to attend. >>> >>> The NYC location is great for Spectrum Scale events because many IBMers, including developers, can come in from Poughkeepsie. >>> >>> More as we get closer to the date and details are settled. >>> >>> Cheers, >>> Kristy >> > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From hpc.ken.tw25qn at gmail.com Fri Sep 8 19:30:32 2017 From: hpc.ken.tw25qn at gmail.com (Ken Atkinson) Date: Fri, 8 Sep 2017 19:30:32 +0100 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 68, Issue 13 In-Reply-To: References: <93DCF805-F703-4ED5-A079-A44992A9268C@ocf.co.uk> Message-ID: Not on too many G&Ts Georgina? How are things. Ken Atkinson On 8 Sep 2017 08:33, "Georgina Ellis" wrote: Apologies All, slip of the keyboard and not a comment on GPFS! Sent from my iPhone > On 7 Sep 2017, at 22:16, "gpfsug-discuss-request at spectrumscale.org" wrote: > > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at spectrumscale.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at spectrumscale.org > > You can reach the person managing the list at > gpfsug-discuss-owner at spectrumscale.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Re: Deletion of a pcache snapshot? (Malka, Janusz) > 2. Re: SMB2 leases - oplocks - growing files (Christof Schmitt) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 7 Sep 2017 21:23:36 +0200 (CEST) > From: "Malka, Janusz" > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? > Message-ID: <950421843.15715748.1504812216286.JavaMail.zimbra at desy.de> > Content-Type: text/plain; charset="utf-8" > > I had similar issue, I had to recover connection to home > > > From: "John Hearns" > To: "gpfsug main discussion list" > Sent: Thursday, 7 September, 2017 17:52:19 > Subject: [gpfsug-discuss] Deletion of a pcache snapshot? > > > > Firmly lining myself up for a smack round the chops with a wet haddock? > > I try to delete an AFM cache fileset which I create da few days ago (it has an NFS home) > > Mmdelfileset responds that : > > Fileset obfuscated has 1 fileset snapshot(s). > > > > When I try to delete the snapshot: > > Snapshot obfuscated.afm.1234 is an internal pcache recovery snapshot and cannot be deleted by user. > > > > I find this reference, which is about as useful as a wet haddock: > > [ https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2. 3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm | https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2. 3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm ] > > > > The advice of the gallery is sought, please. > > > > > > > -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > Message: 2 > Date: Thu, 7 Sep 2017 21:16:34 +0000 > From: "Christof Schmitt" > To: gpfsug-discuss at spectrumscale.org > Cc: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] SMB2 leases - oplocks - growing files > Message-ID: > > > Content-Type: text/plain; charset="us-ascii" > > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 68, Issue 13 > ********************************************** _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Fri Sep 8 22:14:04 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Fri, 8 Sep 2017 17:14:04 -0400 Subject: [gpfsug-discuss] multicluster security In-Reply-To: References: <83936033-ce82-0a9b-3714-1dbea4c317db@nasa.gov> Message-ID: <529f575b-eb11-a81e-2905-ab3237d41678@nasa.gov> Interesting! Thank you for the explanation. This makes me wish GPFS had a client access model that more closely mimicked parallel NAS, specifically for this reason. That then got me wondering about pNFS support. I've not been able to find much about that but in theory Ganesha supports pNFS. Does anyone know of successful pNFS testing with GPFS and if so how one would set up such a thing? -Aaron On 08/25/2017 06:41 PM, IBM Spectrum Scale wrote: > > Hi Aaron, > > If cluster A uses the mmauth command to grant a file system read-only > access to a remote cluster B, nodes on cluster B can only mount that > file system with read-only access. But the only checking being done at > the RPC level is the TLS authentication. This should prevent non-root > users from initiating RPCs, since TLS authentication requires access > to the local cluster's private key. However, a root user on cluster B, > having access to cluster B's private key, might be able to craft RPCs > that may allow one to work around the checks which are implemented at > the file system level. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum > Scale (GPFS), then please post it to the public IBM developerWroks > Forum at > https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > > If your query concerns a potential software error in Spectrum Scale > (GPFS) and you have an IBM software maintenance contract please > contact 1-800-237-5511 in the United States or your local IBM Service > Center in other countries. > > The forum is informally monitored as time permits and should not be > used for priority messages to the Spectrum Scale (GPFS) team. > > Inactive hide details for Aaron Knister ---08/21/2017 11:04:06 PM---Hi > Everyone, I have a theoretical question about GPFS multiAaron Knister > ---08/21/2017 11:04:06 PM---Hi Everyone, I have a theoretical question > about GPFS multiclusters and security. > > From: Aaron Knister > To: gpfsug main discussion list > Date: 08/21/2017 11:04 PM > Subject: [gpfsug-discuss] multicluster security > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > ------------------------------------------------------------------------ > > > > Hi Everyone, > > I have a theoretical question about GPFS multiclusters and security. > Let's say I have clusters A and B. Cluster A is exporting a filesystem > as read-only to cluster B. > > Where does the authorization burden lay? Meaning, does the security rely > on mmfsd in cluster B to behave itself and enforce the conditions of the > multi-cluster export? Could someone using the credentials on a > compromised node in cluster B just start sending arbitrary nsd > read/write commands to the nsds from cluster A (or something along those > lines)? Do the NSD servers in cluster A do any sort of sanity or > security checking on the I/O requests coming from cluster B to the NSDs > they're serving to exported filesystems? > > I imagine any enforcement would go out the window with shared disks in a > multi-cluster environment since a compromised node could just "dd" over > the LUNs. > > Thanks! > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=oK_bEPbjuD7j6qLTHbe7HM4ujUlpcNYtX3tMW2QC7_w&s=BliMQ0pToLIIiO1jfyUp2Q3icewcONrcmHpsIj_hMtY&e= > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From oehmes at gmail.com Fri Sep 8 22:21:00 2017 From: oehmes at gmail.com (Sven Oehme) Date: Fri, 08 Sep 2017 21:21:00 +0000 Subject: [gpfsug-discuss] mmfsd write behavior In-Reply-To: References: <0f61621f-84d9-e249-0dd7-c1a4d50fea86@nasa.gov> Message-ID: Hi, the code assumption is that the underlying device has no volatile write cache, i was absolute sure we have that somewhere in the FAQ, but i couldn't find it, so i will talk to somebody to correct this. if i understand https://www.kernel.org/doc/Documentation/block/writeback_cache_control.txt correct one could enforce this by setting REQ_FUA, but thats not explicitly set today, at least i can't see it. i will discuss this with one of our devs who owns this code and come back. sven On Thu, Sep 7, 2017 at 8:05 PM Aaron Knister wrote: > Thanks Sven. I didn't think GPFS itself was caching anything on that > layer, but it's my understanding that O_DIRECT isn't sufficient to force > I/O to be flushed (e.g. the device itself might have a volatile caching > layer). Take someone using ZFS zvol's as NSDs. I can write() all day log > to that zvol (even with O_DIRECT) but there is absolutely no guarantee > those writes have been committed to stable storage and aren't just > sitting in RAM until an fsync() occurs (or some other bio function that > causes a flush). I also don't believe writing to a SATA drive with > O_DIRECT will force cache flushes of the drive's writeback cache.. > although I just tested that one and it seems to actually trigger a scsi > cache sync. Interesting. > > -Aaron > > On 9/7/17 10:55 PM, Sven Oehme wrote: > > I am not sure what exactly you are looking for but all blockdevices are > > opened with O_DIRECT , we never cache anything on this layer . > > > > > > On Thu, Sep 7, 2017, 7:11 PM Aaron Knister > > wrote: > > > > Hi Everyone, > > > > This is something that's come up in the past and has recently > resurfaced > > with a project I've been working on, and that is-- it seems to me as > > though mmfsd never attempts to flush the cache of the block devices > its > > writing to (looking at blktrace output seems to confirm this). Is > this > > actually the case? I've looked at the gpl headers for linux and I > don't > > see any sign of blkdev_fsync, blkdev_issue_flush, WRITE_FLUSH, or > > REQ_FLUSH. I'm sure there's other ways to trigger this behavior that > > GPFS may very well be using that I've missed. That's why I'm asking > :) > > > > I figure with FPO being pushed as an HDFS replacement using commodity > > drives this feature has *got* to be in the code somewhere. > > > > -Aaron > > > > -- > > Aaron Knister > > NASA Center for Climate Simulation (Code 606.2) > > Goddard Space Flight Center > > (301) 286-2776 > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Sat Sep 9 09:05:31 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Sat, 9 Sep 2017 10:05:31 +0200 Subject: [gpfsug-discuss] multicluster security In-Reply-To: <529f575b-eb11-a81e-2905-ab3237d41678@nasa.gov> References: <83936033-ce82-0a9b-3714-1dbea4c317db@nasa.gov> <529f575b-eb11-a81e-2905-ab3237d41678@nasa.gov> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 105 bytes Desc: not available URL: From aaron.s.knister at nasa.gov Mon Sep 11 01:43:56 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Sun, 10 Sep 2017 20:43:56 -0400 Subject: [gpfsug-discuss] tuning parameters question Message-ID: <21830c8a-4a00-c75a-76aa-772c261c00eb@nasa.gov> Hi All (but mostly Sven), I stumbled across this great gem: files.gpfsug.org/presentations/2014/UG10_GPFS_Performance_Session_v10.pdf and I'm wondering which, if any, of those tuning parameters are still relevant with the 4.2.3 code. Specifically for a CNFS cluster. I'm exporting a gpfs fs as an NFS root to 1k nodes. The boot storm is particularly ugly and the storage doesn't appear to be bottlenecked. I see a lot of waiters like these: Waiting 0.0009 sec since 20:41:31, monitored, thread 2881 InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), reason 'waiting for LX lock' Waiting 0.0009 sec since 20:41:31, monitored, thread 26231 InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), reason 'waiting for LX lock' Waiting 0.0009 sec since 20:41:31, monitored, thread 26146 InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), reason 'waiting for LX lock' Waiting 0.0009 sec since 20:41:31, monitored, thread 18637 InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), reason 'waiting for LX lock' Waiting 0.0009 sec since 20:41:31, monitored, thread 25013 InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), reason 'waiting for LX lock' Waiting 0.0009 sec since 20:41:31, monitored, thread 27879 InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), reason 'waiting for LX lock' Waiting 0.0009 sec since 20:41:31, monitored, thread 26553 InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), reason 'waiting for LX lock' Waiting 0.0009 sec since 20:41:31, monitored, thread 25334 InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), reason 'waiting for LX lock' Waiting 0.0009 sec since 20:41:31, monitored, thread 25337 InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), reason 'waiting for LX lock' and I'm wondering if there's anything immediate one would suggest to help with that. -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From aaron.s.knister at nasa.gov Mon Sep 11 01:50:39 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Sun, 10 Sep 2017 20:50:39 -0400 Subject: [gpfsug-discuss] tuning parameters question In-Reply-To: <21830c8a-4a00-c75a-76aa-772c261c00eb@nasa.gov> References: <21830c8a-4a00-c75a-76aa-772c261c00eb@nasa.gov> Message-ID: <25e8995a-7b5b-05d3-016a-4941fb75dcd6@nasa.gov> As an aside, my initial attempt was to use Ganesha via CES but the performance was significantly worse than CNFS for this workload. The docs seem to suggest that CNFS performs better for metadata intensive workloads which certainly seems to fit the bill here. -Aaron On 9/10/17 8:43 PM, Aaron Knister wrote: > Hi All (but mostly Sven), > > I stumbled across this great gem: > > files.gpfsug.org/presentations/2014/UG10_GPFS_Performance_Session_v10.pdf > > and I'm wondering which, if any, of those tuning parameters are still > relevant with the 4.2.3 code. Specifically for a CNFS cluster. I'm > exporting a gpfs fs as an NFS root to 1k nodes. The boot storm is > particularly ugly and the storage doesn't appear to be bottlenecked. > > I see a lot of waiters like these: > > Waiting 0.0009 sec since 20:41:31, monitored, thread 2881 > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > reason 'waiting for LX lock' > Waiting 0.0009 sec since 20:41:31, monitored, thread 26231 > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > reason 'waiting for LX lock' > Waiting 0.0009 sec since 20:41:31, monitored, thread 26146 > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > reason 'waiting for LX lock' > Waiting 0.0009 sec since 20:41:31, monitored, thread 18637 > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > reason 'waiting for LX lock' > Waiting 0.0009 sec since 20:41:31, monitored, thread 25013 > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > reason 'waiting for LX lock' > Waiting 0.0009 sec since 20:41:31, monitored, thread 27879 > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > reason 'waiting for LX lock' > Waiting 0.0009 sec since 20:41:31, monitored, thread 26553 > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > reason 'waiting for LX lock' > Waiting 0.0009 sec since 20:41:31, monitored, thread 25334 > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > reason 'waiting for LX lock' > Waiting 0.0009 sec since 20:41:31, monitored, thread 25337 > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > reason 'waiting for LX lock' > > and I'm wondering if there's anything immediate one would suggest to > help with that. > > -Aaron > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From stefan.dietrich at desy.de Mon Sep 11 08:40:14 2017 From: stefan.dietrich at desy.de (Dietrich, Stefan) Date: Mon, 11 Sep 2017 09:40:14 +0200 (CEST) Subject: [gpfsug-discuss] Switch from IPoIB connected mode to datagram with ESS 5.2.0? Message-ID: <743361352.9211728.1505115614463.JavaMail.zimbra@desy.de> Hello, during reading the upgrade docs for ESS 5.2.0, I noticed a change in the IPoIB mode. Now it specifies, that datagram (CONNECTED_MODE=no) instead of connected mode should be used. All earlier versions used connected mode. I am wondering about the reason for this change? Or is this only relevant for bonded IPoIB interfaces? Regards, Stefan -- ------------------------------------------------------------------------ Stefan Dietrich Deutsches Elektronen-Synchrotron (IT-Systems) Ein Forschungszentrum der Helmholtz-Gemeinschaft Notkestr. 85 phone: +49-40-8998-4696 22607 Hamburg e-mail: stefan.dietrich at desy.de Germany ------------------------------------------------------------------------ From john.hearns at asml.com Mon Sep 11 08:41:54 2017 From: john.hearns at asml.com (John Hearns) Date: Mon, 11 Sep 2017 07:41:54 +0000 Subject: [gpfsug-discuss] Deletion of a pcache snapshot? In-Reply-To: References: <950421843.15715748.1504812216286.JavaMail.zimbra@desy.de> Message-ID: Thankyou all for advice. The ?-p? option was the fix here (thankyou to IBM support). From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of McLaughlin, Sandra M Sent: Friday, September 08, 2017 11:12 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? John, I had a period when I had to delete and remake AFM filesets rather frequently ? this always worked for me: mmunlinkfileset device fset -f mmdelsnapshot device snapname -j fset mmdelfileset device fset -f Sandra From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of John Hearns Sent: 08 September 2017 08:26 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? Venkat, thankyou. I have a support ticket open on this issue, and will keep this option handy! From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Venkateswara R Puvvada Sent: Friday, September 08, 2017 7:01 AM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? Snapshots created by AFM (recovery, resync or peer snapshots) cannot be deleted by user using mmdelsnapshot command directly. After recovery or resync completion they get deleted automatically. For peer snapshots deletion mmpsnasp command is used. Which version of GPFS? Try with -p (undocumented) option. mmdelsnapshot device snapname -j fset -p ~Venkat (vpuvvada at in.ibm.com) From: "Malka, Janusz" > To: gpfsug main discussion list > Date: 09/08/2017 12:53 AM Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I had similar issue, I had to recover connection to home ________________________________ From: "John Hearns" > To: "gpfsug main discussion list" > Sent: Thursday, 7 September, 2017 17:52:19 Subject: [gpfsug-discuss] Deletion of a pcache snapshot? Firmly lining myself up for a smack round the chops with a wet haddock? I try to delete an AFM cache fileset which I create da few days ago (it has an NFS home) Mmdelfileset responds that : Fileset obfuscated has 1 fileset snapshot(s). When I try to delete the snapshot: Snapshot obfuscated.afm.1234 is an internal pcache recovery snapshot and cannot be deleted by user. I find this reference, which is about as useful as a wet haddock: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm The advice of the gallery is sought, please. -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=VX-Y-5GtodMzrpt1D71-1OOY3hu2UTJBg45sqxTHC8I&s=AmQf6iZeaanuNkB3Ys2lR8Ajk-n2TUJ6Wbt2z2pnbKI&e= -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. ________________________________ AstraZeneca UK Limited is a company incorporated in England and Wales with registered number:03674842 and its registered office at 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA. This e-mail and its attachments are intended for the above named recipient only and may contain confidential and privileged information. If they have come to you in error, you must not copy or show them to anyone; instead, please reply to this e-mail, highlighting the error to the sender and then immediately delete the message. For information about how AstraZeneca UK Limited and its affiliates may process information, personal data and monitor communications, please see our privacy notice at www.astrazeneca.com -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Mon Sep 11 09:11:15 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Mon, 11 Sep 2017 10:11:15 +0200 Subject: [gpfsug-discuss] tuning parameters question In-Reply-To: <25e8995a-7b5b-05d3-016a-4941fb75dcd6@nasa.gov> References: <21830c8a-4a00-c75a-76aa-772c261c00eb@nasa.gov> <25e8995a-7b5b-05d3-016a-4941fb75dcd6@nasa.gov> Message-ID: An HTML attachment was scrubbed... URL: From ed.swindelles at uconn.edu Mon Sep 11 16:49:15 2017 From: ed.swindelles at uconn.edu (Swindelles, Ed) Date: Mon, 11 Sep 2017 15:49:15 +0000 Subject: [gpfsug-discuss] UConn hiring GPFS administrator Message-ID: The University of Connecticut is hiring three full time, permanent technical positions for its HPC team on the Storrs campus. One of these positions is focused on storage administration, including a GPFS cluster. I would greatly appreciate it if you would forward this announcement to contacts of yours who may have an interest in these positions. Here are direct links to the job descriptions and applications: HPC Storage Administrator http://s.uconn.edu/3tx HPC Systems Administrator (2 positions to be filled) http://s.uconn.edu/3tw Thank you, -- Ed Swindelles Team Lead for Research Technology University of Connecticut 860-486-4522 -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Mon Sep 11 23:15:10 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Mon, 11 Sep 2017 18:15:10 -0400 Subject: [gpfsug-discuss] tuning parameters question In-Reply-To: References: <21830c8a-4a00-c75a-76aa-772c261c00eb@nasa.gov> <25e8995a-7b5b-05d3-016a-4941fb75dcd6@nasa.gov> Message-ID: <9de64193-c60c-8ee1-b681-6cfe3993772b@nasa.gov> Thanks, Olaf. I ended up un-setting a bunch of settings that are now auto-tuned (worker1threads, worker3threads, etc.) and just set workerthreads as you suggest. That combined with increasing maxfilestocache to above the max concurrent open file threshold of the workload got me consistently with in 1%-3% of the performance of the same storage hardware running btrfs instead of GPFS. I think that's pretty darned good considering the additional complexity GPFS has over btrfs of being a clustered filesystem. Plus I now get NFS server failover for very little effort and without having to deal with corosync or pacemaker. -Aaron On 9/11/17 4:11 AM, Olaf Weiser wrote: > Hi Aaron , > > 0,0009 s response time for your meta data IO ... seems to be a very > good/fast storage BE.. which is hard to improve.. > you can raise the parallelism a bit for accessing metadata , but if this > will help to improve your "workload" is not assured > > The worker3threads parameter specifies the number of threads to use for > inode prefetch. Usually , I would suggest, that you should not touch > single parameters any longer. By the great improvements of the last few > releases.. GPFS can calculate / retrieve the right settings > semi-automatically... > You only need to set simpler "workerThreads" .. > > But in your case , you can see, if this more specific value will help > you out . > > depending on your blocksize and average filesize .. you may see > additional improvements when tuning nfsPrefetchStrategy , which tells > GPFS to consider all IOs wihtin */N/* blockboundaries as sequential ?and > starts prefetch > > l.b.n.t. set ignoreprefetchLunCount to yes .. (if not already done) . > this helps GPFS to use all available workerThreads > > cheers > olaf > > > > From: Aaron Knister > To: > Date: 09/11/2017 02:50 AM > Subject: Re: [gpfsug-discuss] tuning parameters question > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------------------------------------------------ > > > > As an aside, my initial attempt was to use Ganesha via CES but the > performance was significantly worse than CNFS for this workload. The > docs seem to suggest that CNFS performs better for metadata intensive > workloads which certainly seems to fit the bill here. > > -Aaron > > On 9/10/17 8:43 PM, Aaron Knister wrote: > > Hi All (but mostly Sven), > > > > I stumbled across this great gem: > > > > files.gpfsug.org/presentations/2014/UG10_GPFS_Performance_Session_v10.pdf > > > > and I'm wondering which, if any, of those tuning parameters are still > > relevant with the 4.2.3 code. Specifically for a CNFS cluster. I'm > > exporting a gpfs fs as an NFS root to 1k nodes. The boot storm is > > particularly ugly and the storage doesn't appear to be bottlenecked. > > > > I see a lot of waiters like these: > > > > Waiting 0.0009 sec since 20:41:31, monitored, thread 2881 > > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > > reason 'waiting for LX lock' > > Waiting 0.0009 sec since 20:41:31, monitored, thread 26231 > > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > > reason 'waiting for LX lock' > > Waiting 0.0009 sec since 20:41:31, monitored, thread 26146 > > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > > reason 'waiting for LX lock' > > Waiting 0.0009 sec since 20:41:31, monitored, thread 18637 > > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > > reason 'waiting for LX lock' > > Waiting 0.0009 sec since 20:41:31, monitored, thread 25013 > > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > > reason 'waiting for LX lock' > > Waiting 0.0009 sec since 20:41:31, monitored, thread 27879 > > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > > reason 'waiting for LX lock' > > Waiting 0.0009 sec since 20:41:31, monitored, thread 26553 > > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > > reason 'waiting for LX lock' > > Waiting 0.0009 sec since 20:41:31, monitored, thread 25334 > > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > > reason 'waiting for LX lock' > > Waiting 0.0009 sec since 20:41:31, monitored, thread 25337 > > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > > reason 'waiting for LX lock' > > > > and I'm wondering if there's anything immediate one would suggest to > > help with that. > > > > -Aaron > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From zacekm at img.cas.cz Tue Sep 12 10:40:35 2017 From: zacekm at img.cas.cz (Michal Zacek) Date: Tue, 12 Sep 2017 11:40:35 +0200 Subject: [gpfsug-discuss] Wrong nodename after server restart Message-ID: <7e565699-b583-eeeb-c6b9-d11a39206331@img.cas.cz> Hi, I had to restart two of my gpfs servers (gpfs-n4 and gpfs-quorum) and after that I was unable to move CES IP address back with strange error "mmces address move: GPFS is down on this node". After I double checked that gpfs state is active on all nodes, I dug deeper and I think I found problem, but I don't really know how this could happen. Look at the names of nodes: [root at gpfs-n2 ~]# mmlscluster # Looks good GPFS cluster information ======================== GPFS cluster name: gpfscl1.img.local GPFS cluster id: 17792677515884116443 GPFS UID domain: img.local Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation ---------------------------------------------------------------------------------- 1 gpfs-n4.img.local 192.168.20.64 gpfs-n4.img.local quorum-manager 2 gpfs-quorum.img.local 192.168.20.60 gpfs-quorum.img.local quorum 3 gpfs-n3.img.local 192.168.20.63 gpfs-n3.img.local quorum-manager 4 tau.img.local 192.168.1.248 tau.img.local 5 gpfs-n1.img.local 192.168.20.61 gpfs-n1.img.local quorum-manager 6 gpfs-n2.img.local 192.168.20.62 gpfs-n2.img.local quorum-manager 8 whale.img.cas.cz 147.231.150.108 whale.img.cas.cz [root at gpfs-n2 ~]# mmlsmount gpfs01 -L # not so good File system gpfs01 is mounted on 7 nodes: 192.168.20.63 gpfs-n3 192.168.20.61 gpfs-n1 192.168.20.62 gpfs-n2 192.168.1.248 tau 192.168.20.64 gpfs-n4.img.local 192.168.20.60 gpfs-quorum.img.local 147.231.150.108 whale.img.cas.cz [root at gpfs-n2 ~]# tsctl shownodes up | tr ',' '\n' # very wrong whale.img.cas.cz.img.local tau.img.local gpfs-quorum.img.local.img.local gpfs-n1.img.local gpfs-n2.img.local gpfs-n3.img.local gpfs-n4.img.local.img.local The "tsctl shownodes up" is the reason why I'm not able to move CES address back to gpfs-n4 node, but the real problem are different nodenames. I think OS is configured correctly: [root at gpfs-n4 /]# hostname gpfs-n4 [root at gpfs-n4 /]# hostname -f gpfs-n4.img.local [root at gpfs-n4 /]# cat /etc/resolv.conf nameserver 192.168.20.30 nameserver 147.231.150.2 search img.local domain img.local [root at gpfs-n4 /]# cat /etc/hosts | grep gpfs-n4 192.168.20.64 gpfs-n4.img.local gpfs-n4 [root at gpfs-n4 /]# host gpfs-n4 gpfs-n4.img.local has address 192.168.20.64 [root at gpfs-n4 /]# host 192.168.20.64 64.20.168.192.in-addr.arpa domain name pointer gpfs-n4.img.local. Can someone help me with this. Thanks, Michal p.s. gpfs version: 4.2.3-2 (CentOS 7) From secretary at gpfsug.org Tue Sep 12 15:22:41 2017 From: secretary at gpfsug.org (Secretary GPFS UG) Date: Tue, 12 Sep 2017 15:22:41 +0100 Subject: [gpfsug-discuss] SS UG UK 2018 Message-ID: Dear all, A date for your diary, #SSUG18 in the UK will be taking place on April 18th & 19th 2018. Please mark it in your diaries now! We'll confirm other details (venue, agenda etc.) nearer the time, but the date is confirmed. Thanks, -- Claire O'Toole Spectrum Scale/GPFS User Group Secretary +44 (0)7508 033896 www.spectrumscaleug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Tue Sep 12 16:01:21 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Tue, 12 Sep 2017 11:01:21 -0400 Subject: [gpfsug-discuss] Wrong nodename after server restart In-Reply-To: <7e565699-b583-eeeb-c6b9-d11a39206331@img.cas.cz> References: <7e565699-b583-eeeb-c6b9-d11a39206331@img.cas.cz> Message-ID: Michal, When a node is added to a cluster that has a different domain than the rest of the nodes in the cluster, the GPFS daemons running on the various nodes can develop an inconsistent understanding of what the common suffix of all the domain names are. The symptoms you show with the "tsctl shownodes up" output, and in particular the incorrect node names of the two nodes you restarted, as seen on a node you did not restart, are consistent with this problem. I also note your cluster appears to have the necessary pre-condition to trip on this problem, whale.img.cas.cz does not share a common suffix with the other nodes in the cluster. The common suffix of the other nodes in the cluster is ".img.local". Was whale.img.cas.cz recently added to the cluster? Unfortunately, the general work-around is to recycle all the nodes at once: mmshutdown -a, followed by mmstartup -a. I hope this helps. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Michal Zacek To: gpfsug-discuss at spectrumscale.org Date: 09/12/2017 05:41 AM Subject: [gpfsug-discuss] Wrong nodename after server restart Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, I had to restart two of my gpfs servers (gpfs-n4 and gpfs-quorum) and after that I was unable to move CES IP address back with strange error "mmces address move: GPFS is down on this node". After I double checked that gpfs state is active on all nodes, I dug deeper and I think I found problem, but I don't really know how this could happen. Look at the names of nodes: [root at gpfs-n2 ~]# mmlscluster # Looks good GPFS cluster information ======================== GPFS cluster name: gpfscl1.img.local GPFS cluster id: 17792677515884116443 GPFS UID domain: img.local Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation ---------------------------------------------------------------------------------- 1 gpfs-n4.img.local 192.168.20.64 gpfs-n4.img.local quorum-manager 2 gpfs-quorum.img.local 192.168.20.60 gpfs-quorum.img.local quorum 3 gpfs-n3.img.local 192.168.20.63 gpfs-n3.img.local quorum-manager 4 tau.img.local 192.168.1.248 tau.img.local 5 gpfs-n1.img.local 192.168.20.61 gpfs-n1.img.local quorum-manager 6 gpfs-n2.img.local 192.168.20.62 gpfs-n2.img.local quorum-manager 8 whale.img.cas.cz 147.231.150.108 whale.img.cas.cz [root at gpfs-n2 ~]# mmlsmount gpfs01 -L # not so good File system gpfs01 is mounted on 7 nodes: 192.168.20.63 gpfs-n3 192.168.20.61 gpfs-n1 192.168.20.62 gpfs-n2 192.168.1.248 tau 192.168.20.64 gpfs-n4.img.local 192.168.20.60 gpfs-quorum.img.local 147.231.150.108 whale.img.cas.cz [root at gpfs-n2 ~]# tsctl shownodes up | tr ',' '\n' # very wrong whale.img.cas.cz.img.local tau.img.local gpfs-quorum.img.local.img.local gpfs-n1.img.local gpfs-n2.img.local gpfs-n3.img.local gpfs-n4.img.local.img.local The "tsctl shownodes up" is the reason why I'm not able to move CES address back to gpfs-n4 node, but the real problem are different nodenames. I think OS is configured correctly: [root at gpfs-n4 /]# hostname gpfs-n4 [root at gpfs-n4 /]# hostname -f gpfs-n4.img.local [root at gpfs-n4 /]# cat /etc/resolv.conf nameserver 192.168.20.30 nameserver 147.231.150.2 search img.local domain img.local [root at gpfs-n4 /]# cat /etc/hosts | grep gpfs-n4 192.168.20.64 gpfs-n4.img.local gpfs-n4 [root at gpfs-n4 /]# host gpfs-n4 gpfs-n4.img.local has address 192.168.20.64 [root at gpfs-n4 /]# host 192.168.20.64 64.20.168.192.in-addr.arpa domain name pointer gpfs-n4.img.local. Can someone help me with this. Thanks, Michal p.s. gpfs version: 4.2.3-2 (CentOS 7) _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=l_sz-tPolX87WmSf2zBhhPpggnfQJKp7-BqV8euBp7A&s=XSPGkKRMza8PhYQg8AxeKW9cOTNeCI9uph486_6Xajo&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.kidger at uk.ibm.com Tue Sep 12 16:36:06 2017 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Tue, 12 Sep 2017 15:36:06 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 68, Issue 13 In-Reply-To: Message-ID: Well George is not the only one to have replied to the list with a one to one message. ? Remember folks, this mailing list has a *lot* of people on it. Hope my message is last that forgets who is in the 'To' field. Daniel Daniel Kidger Technical Sales Specialist, IBM UK IBM Spectrum Storage Software daniel.kidger at uk.ibm.com +44 (0)7818 522266 > On 8 Sep 2017, at 19:30, Ken Atkinson wrote: > > Not on too many G&Ts Georgina? > How are things. > Ken Atkinson > > On 8 Sep 2017 08:33, "Georgina Ellis" wrote: > Apologies All, slip of the keyboard and not a comment on GPFS! > > Sent from my iPhone > > > On 7 Sep 2017, at 22:16, "gpfsug-discuss-request at spectrumscale.org" wrote: > > > > Send gpfsug-discuss mailing list submissions to > > gpfsug-discuss at spectrumscale.org > > > > To subscribe or unsubscribe via the World Wide Web, visit > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > or, via email, send a message with subject or body 'help' to > > gpfsug-discuss-request at spectrumscale.org > > > > You can reach the person managing the list at > > gpfsug-discuss-owner at spectrumscale.org > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of gpfsug-discuss digest..." > > > > > > Today's Topics: > > > > 1. Re: Deletion of a pcache snapshot? (Malka, Janusz) > > 2. Re: SMB2 leases - oplocks - growing files (Christof Schmitt) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Thu, 7 Sep 2017 21:23:36 +0200 (CEST) > > From: "Malka, Janusz" > > To: gpfsug main discussion list > > Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? > > Message-ID: <950421843.15715748.1504812216286.JavaMail.zimbra at desy.de> > > Content-Type: text/plain; charset="utf-8" > > > > I had similar issue, I had to recover connection to home > > > > > > From: "John Hearns" > > To: "gpfsug main discussion list" > > Sent: Thursday, 7 September, 2017 17:52:19 > > Subject: [gpfsug-discuss] Deletion of a pcache snapshot? > > > > > > > > Firmly lining myself up for a smack round the chops with a wet haddock? > > > > I try to delete an AFM cache fileset which I create da few days ago (it has an NFS home) > > > > Mmdelfileset responds that : > > > > Fileset obfuscated has 1 fileset snapshot(s). > > > > > > > > When I try to delete the snapshot: > > > > Snapshot obfuscated.afm.1234 is an internal pcache recovery snapshot and cannot be deleted by user. > > > > > > > > I find this reference, which is about as useful as a wet haddock: > > > > [ https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm | https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm ] > > > > > > > > The advice of the gallery is sought, please. > > > > > > > > > > > > > > -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > URL: > > > > ------------------------------ > > > > Message: 2 > > Date: Thu, 7 Sep 2017 21:16:34 +0000 > > From: "Christof Schmitt" > > To: gpfsug-discuss at spectrumscale.org > > Cc: gpfsug-discuss at spectrumscale.org > > Subject: Re: [gpfsug-discuss] SMB2 leases - oplocks - growing files > > Message-ID: > > > > > > Content-Type: text/plain; charset="us-ascii" > > > > An HTML attachment was scrubbed... > > URL: > > > > ------------------------------ > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > End of gpfsug-discuss Digest, Vol 68, Issue 13 > > ********************************************** > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.mills at nasa.gov Tue Sep 12 17:06:23 2017 From: jonathan.mills at nasa.gov (Jonathan Mills) Date: Tue, 12 Sep 2017 12:06:23 -0400 (EDT) Subject: [gpfsug-discuss] Support for SLES 12 SP3 Message-ID: SLES 12 SP3 has been released. And for what it?s worth, there does not appear to be substantial changes in either kernel or glibc as compared to SLES 12 SP2. In fact, the latest SLES 12 SP2 kernel is ?4.4.74-92.29?, while the initial SLES 12 SP3 kernel is ?4.4.73-5.1?. Given this, I wanted to ask the team at IBM: 1) have you begun looking into SLES 12 SP3 yet? 2) if so, do you have any idea when you might release a fully supported version of Spectrum Scale for SLES 12 SP3? Those of us who run SLES and are looking to deploy new infrastructure this fall would prefer to do so on the latest rev of our OS, as opposed to one that is already on life support... -- Jonathan Mills / jonathan.mills at nasa.gov NASA GSFC / NCCS HPC (606.2) Bldg 28, Rm. S230 / c. 252-412-5710 From Greg.Lehmann at csiro.au Wed Sep 13 00:12:55 2017 From: Greg.Lehmann at csiro.au (Greg.Lehmann at csiro.au) Date: Tue, 12 Sep 2017 23:12:55 +0000 Subject: [gpfsug-discuss] Support for SLES 12 SP3 In-Reply-To: References: Message-ID: <67f390a558244c41b154a7a6a9e5efe8@exch1-cdc.nexus.csiro.au> +1. We are interested in SLES 12 SP3 too. BTW had anybody done any comparisons of SLES 12 SP2 (4.4) kernel vs RHEL 7.3 in terms of GPFS IO performance? I would think the 4.4 kernel might give it an edge. I'll probably get around to comparing them myself one day, but if anyone else has some numbers... -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jonathan Mills Sent: Wednesday, 13 September 2017 2:06 AM To: gpfsug main discussion list Subject: [gpfsug-discuss] Support for SLES 12 SP3 SLES 12 SP3 has been released. And for what it?s worth, there does not appear to be substantial changes in either kernel or glibc as compared to SLES 12 SP2. In fact, the latest SLES 12 SP2 kernel is ?4.4.74-92.29?, while the initial SLES 12 SP3 kernel is ?4.4.73-5.1?. Given this, I wanted to ask the team at IBM: 1) have you begun looking into SLES 12 SP3 yet? 2) if so, do you have any idea when you might release a fully supported version of Spectrum Scale for SLES 12 SP3? Those of us who run SLES and are looking to deploy new infrastructure this fall would prefer to do so on the latest rev of our OS, as opposed to one that is already on life support... -- Jonathan Mills / jonathan.mills at nasa.gov NASA GSFC / NCCS HPC (606.2) Bldg 28, Rm. S230 / c. 252-412-5710 From scale at us.ibm.com Wed Sep 13 22:33:30 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Wed, 13 Sep 2017 17:33:30 -0400 Subject: [gpfsug-discuss] Fw: Wrong nodename after server restart Message-ID: ----- Forwarded by Eric Agar/Poughkeepsie/IBM on 09/13/2017 05:32 PM ----- From: IBM Spectrum Scale/Poughkeepsie/IBM To: Michal Zacek Date: 09/13/2017 05:29 PM Subject: Re: [gpfsug-discuss] Wrong nodename after server restart Sent by: Eric Agar Hello Michal, It should not be necessary to delete whale.img.cas.cz and rename it. But, that is an option you can take, if you prefer it. If you decide to take that option, please see the last paragraph of this response. The confusion starts at the moment a node is added to the active cluster where the new node does not have the same common domain suffix as the nodes that were already in the cluster. The confusion increases when the GPFS daemons on some nodes, but not all nodes, are recycled. Doing mmshutdown -a, followed by mmstartup -a, once after the new node has been added allows all GPFS daemons on all nodes to come up at the same time and arrive at the same answer to the question, "what is the common domain suffix for all the nodes in the cluster now?" In the case of your cluster, the answer will be "the common domain suffix is the empty string" or, put another way, "there is no common domain suffix"; that is okay, as long as all the GPFS daemons come to the same conclusion. After you recycle the cluster, you can check to make sure all seems well by running "tsctl shownodes up" on every node, and make sure the answer is correct on each node. If the mmshutdown -a / mmstartup -a recycle works, the problem should not recur with the current set of nodes in the cluster. Even as individual GPFS daemons are recycled going forward, they should still understand the cluster's nodes have no common domain suffix. However, I can imagine sequences of events that would cause the issue to occur again after nodes are deleted or added to the cluster while the cluster is active. For example, if whale.img.cas.cz were to be deleted from the current cluster, that action would restore the cluster to having a common domain suffix of ".img.local", but already running GPFS daemons would not realize it. If the delete of whale occurred while the cluster was active, subsequent recycling of the GPFS daemon on just a subset of the nodes would cause the recycled daemons to understand the common domain suffix to now be ".img.local". But, daemons that had not been recycled would still think there is no common domain suffix. The confusion would occur again. On the other hand, adding and deleting nodes to/from the cluster should not cause the issue to occur again as long as the cluster continues to have the same (in this case, no) common domain suffix. If you decide to delete whale.img.case.cz, rename it to have the ".img.local" domain suffix, and add it back to the cluster, it would be best to do so after all the GPFS daemons are shut down with mmshutdown -a, but before any of the daemons are restarted with mmstartup. This would allow all the subsequent running daemons to come to the conclusion that ".img.local" is now the common domain suffix. I hope this helps. Regards, Eric Agar Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Michal Zacek To: IBM Spectrum Scale Date: 09/13/2017 03:42 AM Subject: Re: [gpfsug-discuss] Wrong nodename after server restart Hello yes you are correct, Whale was added two days a go. It's necessary to delete whale.img.cas.cz from cluster before mmshutdown/mmstartup? If the two domains may cause problems in the future I can rename whale (and all planed nodes) to img.local suffix. Many thanks for the prompt reply. Regards Michal Dne 12.9.2017 v 17:01 IBM Spectrum Scale napsal(a): Michal, When a node is added to a cluster that has a different domain than the rest of the nodes in the cluster, the GPFS daemons running on the various nodes can develop an inconsistent understanding of what the common suffix of all the domain names are. The symptoms you show with the "tsctl shownodes up" output, and in particular the incorrect node names of the two nodes you restarted, as seen on a node you did not restart, are consistent with this problem. I also note your cluster appears to have the necessary pre-condition to trip on this problem, whale.img.cas.cz does not share a common suffix with the other nodes in the cluster. The common suffix of the other nodes in the cluster is ".img.local". Was whale.img.cas.cz recently added to the cluster? Unfortunately, the general work-around is to recycle all the nodes at once: mmshutdown -a, followed by mmstartup -a. I hope this helps. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Michal Zacek To: gpfsug-discuss at spectrumscale.org Date: 09/12/2017 05:41 AM Subject: [gpfsug-discuss] Wrong nodename after server restart Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, I had to restart two of my gpfs servers (gpfs-n4 and gpfs-quorum) and after that I was unable to move CES IP address back with strange error "mmces address move: GPFS is down on this node". After I double checked that gpfs state is active on all nodes, I dug deeper and I think I found problem, but I don't really know how this could happen. Look at the names of nodes: [root at gpfs-n2 ~]# mmlscluster # Looks good GPFS cluster information ======================== GPFS cluster name: gpfscl1.img.local GPFS cluster id: 17792677515884116443 GPFS UID domain: img.local Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation ---------------------------------------------------------------------------------- 1 gpfs-n4.img.local 192.168.20.64 gpfs-n4.img.local quorum-manager 2 gpfs-quorum.img.local 192.168.20.60 gpfs-quorum.img.local quorum 3 gpfs-n3.img.local 192.168.20.63 gpfs-n3.img.local quorum-manager 4 tau.img.local 192.168.1.248 tau.img.local 5 gpfs-n1.img.local 192.168.20.61 gpfs-n1.img.local quorum-manager 6 gpfs-n2.img.local 192.168.20.62 gpfs-n2.img.local quorum-manager 8 whale.img.cas.cz 147.231.150.108 whale.img.cas.cz [root at gpfs-n2 ~]# mmlsmount gpfs01 -L # not so good File system gpfs01 is mounted on 7 nodes: 192.168.20.63 gpfs-n3 192.168.20.61 gpfs-n1 192.168.20.62 gpfs-n2 192.168.1.248 tau 192.168.20.64 gpfs-n4.img.local 192.168.20.60 gpfs-quorum.img.local 147.231.150.108 whale.img.cas.cz [root at gpfs-n2 ~]# tsctl shownodes up | tr ',' '\n' # very wrong whale.img.cas.cz.img.local tau.img.local gpfs-quorum.img.local.img.local gpfs-n1.img.local gpfs-n2.img.local gpfs-n3.img.local gpfs-n4.img.local.img.local The "tsctl shownodes up" is the reason why I'm not able to move CES address back to gpfs-n4 node, but the real problem are different nodenames. I think OS is configured correctly: [root at gpfs-n4 /]# hostname gpfs-n4 [root at gpfs-n4 /]# hostname -f gpfs-n4.img.local [root at gpfs-n4 /]# cat /etc/resolv.conf nameserver 192.168.20.30 nameserver 147.231.150.2 search img.local domain img.local [root at gpfs-n4 /]# cat /etc/hosts | grep gpfs-n4 192.168.20.64 gpfs-n4.img.local gpfs-n4 [root at gpfs-n4 /]# host gpfs-n4 gpfs-n4.img.local has address 192.168.20.64 [root at gpfs-n4 /]# host 192.168.20.64 64.20.168.192.in-addr.arpa domain name pointer gpfs-n4.img.local. Can someone help me with this. Thanks, Michal p.s. gpfs version: 4.2.3-2 (CentOS 7) _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=l_sz-tPolX87WmSf2zBhhPpggnfQJKp7-BqV8euBp7A&s=XSPGkKRMza8PhYQg8AxeKW9cOTNeCI9uph486_6Xajo&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Michal ???ek | Information Technologies +420 296 443 128 +420 296 443 333 michal.zacek at img.cas.cz www.img.cas.cz Institute of Molecular Genetics of the ASCR, v. v. i., V?de?sk? 1083, 142 20 Prague 4, Czech Republic ID: 68378050 | VAT ID: CZ68378050 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1997 bytes Desc: not available URL: From valdis.kletnieks at vt.edu Thu Sep 14 01:18:51 2017 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Wed, 13 Sep 2017 20:18:51 -0400 Subject: [gpfsug-discuss] mmapplypolicy run time weirdness.. Message-ID: <52657.1505348331@turing-police.cc.vt.edu> So we have a number of very similar policy files that get applied for file migration etc. And they vary drastically in the runtime to process, apparently due to different selections on whether to do the work in parallel. Running a set of rules with 'mmapplypolicy -I defer' that look like this: RULE 'VBI_FILES_RULE' MIGRATE FROM POOL 'system' THRESHOLD(0,100,0) WEIGHT(FILE_SIZE) TO POOL 'VBI_FILES' FOR FILESET('vbi') WHERE (mb_allocated >= 8) for 10 filesets can scan 325M directory entries in 6 minutes, and sort and evaluate the policy in 3 more minutes. However, this takes a bit over 30 minutes for the scan and another 20 for sorting and policy evaluation over the same set of filesets: RULE 'VBI_FILES_RULE' LIST 'pruned_files' THRESHOLD(90,80) WEIGHT(FILE_SIZE) FOR FILESET('vbi') WHERE (mb_allocated >= 8) even though the output is essentially identical. Why is LIST so much more expensive than 'MIGRATE" with '-I defer'? I could understand if I had an expensive SHOW clause, but there isn't one here (and a different policy that I run that *does* have a big SHOW clause takes almost the same amount of time as the minimal LIST).... I'm thinking that it has *something* to do with the MIGRATE job outputting: [I] 2017-09-12 at 21:20:44.155 Parallel-piped sort and policy evaluation. 0 files scanned. (...) [I] 2017-09-12 at 21:24:14.672 Piped sorting and candidate file choosing. 0 records scanned. while the LIST job says: [I] 2017-09-12 at 13:58:06.926 Sorting 327627521 file list records. (...) [I] 2017-09-12 at 14:02:04.223 Policy evaluation. 0 files scanned. (Both output the same message during the 'Directory entries scanned: 0.' phase, but I suspect MIGRATE is multi-threading that part as well, as it completes much faster). What's the controlling factor in mmapplypolicy's decision whether or not to parallelize the policy? From oehmes at gmail.com Thu Sep 14 01:28:46 2017 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 14 Sep 2017 00:28:46 +0000 Subject: [gpfsug-discuss] mmapplypolicy run time weirdness.. In-Reply-To: <52657.1505348331@turing-police.cc.vt.edu> References: <52657.1505348331@turing-police.cc.vt.edu> Message-ID: can you please share the entire command line you are using ? also gpfs version, mmlsconfig output would help as well as if this is a shared storage filesystem or a system using local disks. thx. Sven On Wed, Sep 13, 2017 at 5:19 PM wrote: > So we have a number of very similar policy files that get applied for file > migration etc. And they vary drastically in the runtime to process, > apparently > due to different selections on whether to do the work in parallel. > > Running a set of rules with 'mmapplypolicy -I defer' that look like this: > > RULE 'VBI_FILES_RULE' MIGRATE FROM POOL 'system' > THRESHOLD(0,100,0) > WEIGHT(FILE_SIZE) > TO POOL 'VBI_FILES' > FOR FILESET('vbi') > WHERE (mb_allocated >= 8) > > for 10 filesets can scan 325M directory entries in 6 minutes, and sort and > evaluate the policy in 3 more minutes. > > However, this takes a bit over 30 minutes for the scan and another 20 for > sorting and policy evaluation over the same set of filesets: > > RULE 'VBI_FILES_RULE' LIST 'pruned_files' > THRESHOLD(90,80) > WEIGHT(FILE_SIZE) > FOR FILESET('vbi') > WHERE (mb_allocated >= 8) > > even though the output is essentially identical. Why is LIST so much more > expensive than 'MIGRATE" with '-I defer'? I could understand if I > had an > expensive SHOW clause, but there isn't one here (and a different policy > that I > run that *does* have a big SHOW clause takes almost the same amount of > time as > the minimal LIST).... > > I'm thinking that it has *something* to do with the MIGRATE job outputting: > > [I] 2017-09-12 at 21:20:44.155 Parallel-piped sort and policy evaluation. 0 > files scanned. > (...) > [I] 2017-09-12 at 21:24:14.672 Piped sorting and candidate file choosing. 0 > records scanned. > > while the LIST job says: > > [I] 2017-09-12 at 13:58:06.926 Sorting 327627521 file list records. > (...) > [I] 2017-09-12 at 14:02:04.223 Policy evaluation. 0 files scanned. > > (Both output the same message during the 'Directory entries scanned: 0.' > phase, but I suspect MIGRATE is multi-threading that part as well, as it > completes much faster). > > What's the controlling factor in mmapplypolicy's decision whether or > not to parallelize the policy? > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kh.atmane at gmail.com Thu Sep 14 13:49:55 2017 From: kh.atmane at gmail.com (atmane) Date: Thu, 14 Sep 2017 13:49:55 +0100 Subject: [gpfsug-discuss] Disk change problem in gss GNR Message-ID: dear all, I change A Disk In Gss Storage Server mmchcarrier BB1RGL --release --pdisk 'e1d1s02' mmchcarrier BB1RGL --replace --pdisk 'e1d1s02' after replace disk Now I Have 2 Discs In My Gss the first disc was well changed name = "e1d1s02" the second disk still after I use this cmd mmdelpdisk BB1RGL --pdisk e1d1s02#004 -a the disk is still in use i need to reboot the system or ?? mmlspdisk all | less pdisk: replacementPriority = 1000 name = "e1d1s02" device = "/dev/sdik,/dev/sdih" recoveryGroup = "BB1RGL" declusteredArray = "DA1" state = "ok" capacity = 3000034656256 freeSpace = 1453846429696 fru = "00W1572" location = "SV30820390-1-2" WWN = "naa.5000C5008D783E37" server = "gss0-ib0" pdisk: replacementPriority = 1000 name = "e1d1s02#004" device = "" recoveryGroup = "BB1RGL" declusteredArray = "DA1" state = "missing/noPath/systemDrain/adminDrain/noRGD/noVCD" capacity = 3000034656256 freeSpace = 1599875317760 fru = "00W1572" location = "" WWN = "naa.5000C50056714E83" server = "gss0-ib0" -- -- Atmane Khiredine HPC System Admin | Office National de la M?t?orologie T?l : +213 21 50 73 93 Poste 303 | Fax : +213 21 50 79 40 | E-mail : a.khiredine at meteo.dz From makaplan at us.ibm.com Thu Sep 14 19:55:39 2017 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 14 Sep 2017 14:55:39 -0400 Subject: [gpfsug-discuss] mmapplypolicy run time weirdness.. In-Reply-To: <52657.1505348331@turing-police.cc.vt.edu> References: <52657.1505348331@turing-police.cc.vt.edu> Message-ID: Read the doc again. Specify both -g and -N options on the command line to get fully parallel directory and inode/policy scanning. I'm curious as to what you're trying to do with THRESHOLD(0,100,0) ... Perhaps premigrate everything (that matches the other conditions)? You are correct about I] 2017-09-12 at 21:20:44.155 Parallel-piped sort and policy evaluation. 0 files scanned. (...) [I] 2017-09-12 at 21:24:14.672 Piped sorting and candidate file choosing. 0 records scanned. If you don't see messages like that, you did not specify both -N and -g. From: valdis.kletnieks at vt.edu To: gpfsug-discuss at spectrumscale.org Date: 09/13/2017 08:19 PM Subject: [gpfsug-discuss] mmapplypolicy run time weirdness.. Sent by: gpfsug-discuss-bounces at spectrumscale.org So we have a number of very similar policy files that get applied for file migration etc. And they vary drastically in the runtime to process, apparently due to different selections on whether to do the work in parallel. Running a set of rules with 'mmapplypolicy -I defer' that look like this: RULE 'VBI_FILES_RULE' MIGRATE FROM POOL 'system' THRESHOLD(0,100,0) WEIGHT(FILE_SIZE) TO POOL 'VBI_FILES' FOR FILESET('vbi') WHERE (mb_allocated >= 8) for 10 filesets can scan 325M directory entries in 6 minutes, and sort and evaluate the policy in 3 more minutes. However, this takes a bit over 30 minutes for the scan and another 20 for sorting and policy evaluation over the same set of filesets: RULE 'VBI_FILES_RULE' LIST 'pruned_files' THRESHOLD(90,80) WEIGHT(FILE_SIZE) FOR FILESET('vbi') WHERE (mb_allocated >= 8) even though the output is essentially identical. Why is LIST so much more expensive than 'MIGRATE" with '-I defer'? I could understand if I had an expensive SHOW clause, but there isn't one here (and a different policy that I run that *does* have a big SHOW clause takes almost the same amount of time as the minimal LIST).... I'm thinking that it has *something* to do with the MIGRATE job outputting: [I] 2017-09-12 at 21:20:44.155 Parallel-piped sort and policy evaluation. 0 files scanned. (...) [I] 2017-09-12 at 21:24:14.672 Piped sorting and candidate file choosing. 0 records scanned. while the LIST job says: [I] 2017-09-12 at 13:58:06.926 Sorting 327627521 file list records. (...) [I] 2017-09-12 at 14:02:04.223 Policy evaluation. 0 files scanned. (Both output the same message during the 'Directory entries scanned: 0.' phase, but I suspect MIGRATE is multi-threading that part as well, as it completes much faster). What's the controlling factor in mmapplypolicy's decision whether or not to parallelize the policy? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8&m=SGbwD3m5mZ16_vwIFK8Ym48lwdF1tVktnSao0a_tkfA&s=sLt9AtZiZ0qZCKzuQoQuyxN76_R66jfAwQxdIY-w2m0&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Thu Sep 14 21:09:40 2017 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Thu, 14 Sep 2017 16:09:40 -0400 Subject: [gpfsug-discuss] mmapplypolicy run time weirdness.. In-Reply-To: References: <52657.1505348331@turing-police.cc.vt.edu> Message-ID: <26551.1505419780@turing-police.cc.vt.edu> On Thu, 14 Sep 2017 14:55:39 -0400, "Marc A Kaplan" said: > Read the doc again. Specify both -g and -N options on the command line to > get fully parallel directory and inode/policy scanning. Yeah, figured that out, with help from somebody. :) > I'm curious as to what you're trying to do with THRESHOLD(0,100,0) ... > Perhaps premigrate everything (that matches the other conditions)? Yeah, it's actually feeding to LTFS/EE - where we premigrate everything that matches to tape. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From makaplan at us.ibm.com Thu Sep 14 22:13:59 2017 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 14 Sep 2017 17:13:59 -0400 Subject: [gpfsug-discuss] mmapplypolicy run time weirdness.. In-Reply-To: <26551.1505419780@turing-police.cc.vt.edu> References: <52657.1505348331@turing-police.cc.vt.edu> <26551.1505419780@turing-police.cc.vt.edu> Message-ID: BTW - we realize that mmapplypolicy -g and -N is a "gotcha" for some (many?) customer/admins -- so we're considering ways to make that easier -- but without "breaking" scripts and callbacks and what-have-yous that might depend on the current/old defaults... Always a balancing act -- considering that GPFS ne Spectrum Scale just hit its 20th birthday (by IBM reckoning) --marc of GPFS -------------- next part -------------- An HTML attachment was scrubbed... URL: From neil.wilson at metoffice.gov.uk Fri Sep 15 11:47:19 2017 From: neil.wilson at metoffice.gov.uk (Wilson, Neil) Date: Fri, 15 Sep 2017 10:47:19 +0000 Subject: [gpfsug-discuss] ZIMON Sensors config files... Message-ID: Hi, Does anyone know how to use "mmperfmon config update" to get the "hostname =" field in the ZImonSensors.cfg file populated with the hostname of the node that it's been installed on? By default the field is empty and for some reason on our cluster it doesn't transmit any metrics unless we put the node hostname into that field. Is there some kind of wildcard that I can set? Thanks Neil Neil Wilson Senior IT Practitioner Storage Team IT Services Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom Tel: +44 (0)1392 885959 Email: neil.wilson at metoffice.gov.uk Website www.metoffice.gov.uk Our magazine Barometer is now available online at http://www.metoffice.gov.uk/barometer/ P Please consider the environment before printing this e-mail. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.hearns at asml.com Fri Sep 15 16:37:13 2017 From: john.hearns at asml.com (John Hearns) Date: Fri, 15 Sep 2017 15:37:13 +0000 Subject: [gpfsug-discuss] NFS re-export with nfs-ganesha proxy? Message-ID: This is very probably off topic here.. I would be happy to get any responses off list. My question is has anyone here set up NFS re-export / proxy with nfs-ganesha? John Hearns -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Greg.Lehmann at csiro.au Mon Sep 18 01:14:52 2017 From: Greg.Lehmann at csiro.au (Greg.Lehmann at csiro.au) Date: Mon, 18 Sep 2017 00:14:52 +0000 Subject: [gpfsug-discuss] NFS re-export with nfs-ganesha proxy? In-Reply-To: References: Message-ID: <5d1811f4d6ad4605bd2a7c7441f4dd1b@exch1-cdc.nexus.csiro.au> I am interested too, so maybe keep it on list? From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of John Hearns Sent: Saturday, 16 September 2017 1:37 AM To: gpfsug main discussion list Subject: [gpfsug-discuss] NFS re-export with nfs-ganesha proxy? This is very probably off topic here.. I would be happy to get any responses off list. My question is has anyone here set up NFS re-export / proxy with nfs-ganesha? John Hearns -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.lefebvre+gpfsug at calculquebec.ca Mon Sep 18 20:16:57 2017 From: richard.lefebvre+gpfsug at calculquebec.ca (Richard Lefebvre) Date: Mon, 18 Sep 2017 15:16:57 -0400 Subject: [gpfsug-discuss] How to find which node is generating high iops in a GPFS 3.5 Message-ID: Hi I have a 3.5 GPFS system with 700+ nodes. I sometime have nodes that generate a lot of iops on the large file system but I cannot find the right tool to find which node is the source. I'm guessing under 4.2.X, there are now easy tools, but what can be done under GPFS 3.5. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Mon Sep 18 20:27:49 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 18 Sep 2017 19:27:49 +0000 Subject: [gpfsug-discuss] How to find which node is generating high iops in a GPFS 3.5 Message-ID: <39FB5D56-A8C4-47DA-8A56-A2E453724875@nuance.com> You do realize 3.5 is out of service, correct? You should be looking at upgrading :-) Catching this is real time, when you have a large number of nodes is going to be tough. How you recognizing that the file system is overloaded? Waiters? Looking at which nodes/NSDs have the longest/largest waiters may provide a clue. You might also take a look at mmpmon ? it?s a bit difficult to use in its raw state, but it does provide some good stats on a per file system basis. But you need to track these over times to get what you need. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Richard Lefebvre Reply-To: gpfsug main discussion list Date: Monday, September 18, 2017 at 2:18 PM To: gpfsug Subject: [EXTERNAL] [gpfsug-discuss] How to find which node is generating high iops in a GPFS 3.5 Hi I have a 3.5 GPFS system with 700+ nodes. I sometime have nodes that generate a lot of iops on the large file system but I cannot find the right tool to find which node is the source. I'm guessing under 4.2.X, there are now easy tools, but what can be done under GPFS 3.5. -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Tue Sep 19 07:47:42 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Tue, 19 Sep 2017 14:47:42 +0800 Subject: [gpfsug-discuss] ZIMON Sensors config files... In-Reply-To: References: Message-ID: Hi Neil, Have you tried these steps? mmperfmon config show --config-file /tmp/a vi /tmp/a mmperfmon config update --collectors oc8757286465 --config-file /tmp/a mmperfmon config show Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Wilson, Neil" To: gpfsug main discussion list Date: 09/15/2017 06:48 PM Subject: [gpfsug-discuss] ZIMON Sensors config files... Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, Does anyone know how to use ?mmperfmon config update? to get the ?hostname =? field in the ZImonSensors.cfg file populated with the hostname of the node that it?s been installed on? By default the field is empty and for some reason on our cluster it doesn?t transmit any metrics unless we put the node hostname into that field. Is there some kind of wildcard that I can set? Thanks Neil Neil Wilson Senior IT Practitioner Storage Team IT Services Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom Tel: +44 (0)1392 885959 Email: neil.wilson at metoffice.gov.uk Website www.metoffice.gov.uk Our magazine Barometer is now available online at http://www.metoffice.gov.uk/barometer/ P Please consider the environment before printing this e-mail. Thank you. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=JJA1q39zaRyjClihY50646c-CyY4ZvrmpSjR1qs5rTc&s=GWOiCpEHiZ_TqlFj0AeKmjcccnez-X2rHMa5UtvGPTk&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From scale at us.ibm.com Tue Sep 19 07:54:50 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Tue, 19 Sep 2017 14:54:50 +0800 Subject: [gpfsug-discuss] How to find which node is generating high iops in a GPFS 3.5 In-Reply-To: <39FB5D56-A8C4-47DA-8A56-A2E453724875@nuance.com> References: <39FB5D56-A8C4-47DA-8A56-A2E453724875@nuance.com> Message-ID: Hi Richard, Is any of tool in https://www.ibm.com/developerworks/community/wikis/home?_escaped_fragment_=/wiki/General%2520Parallel%2520File%2520System%2520%2528GPFS%2529/page/Display%2520per%2520node%2520IO%2520statstics can help you? BTW, I agree with Bob that 3.5 is out-of-service. Without an extended service, you should consider to upgrade your cluster as soon as possible. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 09/19/2017 03:28 AM Subject: Re: [gpfsug-discuss] How to find which node is generating high iops in a GPFS 3.5 Sent by: gpfsug-discuss-bounces at spectrumscale.org You do realize 3.5 is out of service, correct? You should be looking at upgrading :-) Catching this is real time, when you have a large number of nodes is going to be tough. How you recognizing that the file system is overloaded? Waiters? Looking at which nodes/NSDs have the longest/largest waiters may provide a clue. You might also take a look at mmpmon ? it?s a bit difficult to use in its raw state, but it does provide some good stats on a per file system basis. But you need to track these over times to get what you need. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Richard Lefebvre Reply-To: gpfsug main discussion list Date: Monday, September 18, 2017 at 2:18 PM To: gpfsug Subject: [EXTERNAL] [gpfsug-discuss] How to find which node is generating high iops in a GPFS 3.5 Hi I have a 3.5 GPFS system with 700+ nodes. I sometime have nodes that generate a lot of iops on the large file system but I cannot find the right tool to find which node is the source. I'm guessing under 4.2.X, there are now easy tools, but what can be done under GPFS 3.5. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=AYwUf61wv-Hq63KU7veQSxavdZy-e9eT9bkJFav8MVU&s=W42AQE74bvmOlw7P0D0wTqT0Rxop4KktnXeuDeGGdmk&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From rohwedder at de.ibm.com Tue Sep 19 08:42:46 2017 From: rohwedder at de.ibm.com (Markus Rohwedder) Date: Tue, 19 Sep 2017 09:42:46 +0200 Subject: [gpfsug-discuss] ZIMON Sensors config files... In-Reply-To: References: Message-ID: Hello Neil, While the description below provides a way on how to edit the hostname parameter, you should not have the need to edit the "hostname" parameter. Sensors use the hostname() call to get the hostname where the sensor is running and use this as key in the performance database, which is what you typically want to see. From the description you provide I assume you want to have a sensor running on every node that has the perfmon designation? There could be different issues: > In order to enable sensors on every node, you need to ensure there is no "restrict" clause in the sensor description, or the restrict clause has to be set correctly > There could be some other communication issue between sensors and collectors. Restart sensors and collectors and check the logfiles in /var/log/zimon/. You should be able to see which sensors start up and if they can connect. > Can you check if you have the perfmon designation set for the nodes where you expect data from (mmlscluster) Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development Phone: +49 7034 6430190 IBM Deutschland E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Martina K?deritz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "IBM Spectrum Scale" To: gpfsug main discussion list Cc: gpfsug-discuss-bounces at spectrumscale.org Date: 09/19/2017 08:48 AM Subject: Re: [gpfsug-discuss] ZIMON Sensors config files... Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Neil, Have you tried these steps? mmperfmon config show --config-file /tmp/a vi /tmp/a mmperfmon config update --collectors oc8757286465 --config-file /tmp/a mmperfmon config show Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. Inactive hide details for "Wilson, Neil" ---09/15/2017 06:48:26 PM---Hi, Does anyone know how to use "mmperfmon config update" "Wilson, Neil" ---09/15/2017 06:48:26 PM---Hi, Does anyone know how to use "mmperfmon config update" to get the "hostname =" field in the ZImon From: "Wilson, Neil" To: gpfsug main discussion list Date: 09/15/2017 06:48 PM Subject: [gpfsug-discuss] ZIMON Sensors config files... Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, Does anyone know how to use ?mmperfmon config update? to get the ?hostname =? field in the ZImonSensors.cfg file populated with the hostname of the node that it?s been installed on? By default the field is empty and for some reason on our cluster it doesn?t transmit any metrics unless we put the node hostname into that field. Is there some kind of wildcard that I can set? Thanks Neil Neil Wilson Senior IT Practitioner Storage Team IT Services Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom Tel: +44 (0)1392 885959 Email: neil.wilson at metoffice.gov.uk Website www.metoffice.gov.uk Our magazine Barometer is now available online at http://www.metoffice.gov.uk/barometer/ P Please consider the environment before printing this e-mail. Thank you. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=JJA1q39zaRyjClihY50646c-CyY4ZvrmpSjR1qs5rTc&s=GWOiCpEHiZ_TqlFj0AeKmjcccnez-X2rHMa5UtvGPTk&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=Ow2bpnoab1kboH2xuSUrbx65ALeoAAicG7csl1sV-Qc&s=qZ1XUXWfOayLSSuvcCyHQ2ZgY1mu0Zs3kmpgeVQUCYI&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1D696444.gif Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From mnaineni at in.ibm.com Tue Sep 19 12:50:50 2017 From: mnaineni at in.ibm.com (Malahal R Naineni) Date: Tue, 19 Sep 2017 11:50:50 +0000 Subject: [gpfsug-discuss] NFS re-export with nfs-ganesha proxy? (Greg.Lehmann@csiro.au) Message-ID: An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Tue Sep 19 22:02:03 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Tue, 19 Sep 2017 21:02:03 +0000 Subject: [gpfsug-discuss] CCR cluster down for the count? Message-ID: <26ABC473-387D-4D58-9059-518E455724A9@vanderbilt.edu> Hi All, We have a small test cluster that is CCR enabled. It only had/has 3 NSD servers (testnsd1, 2, and 3) and maybe 3-6 clients. testnsd3 died a while back. I did nothing about it at the time because it was due to be life-cycled as soon as I finished a couple of higher priority projects. Yesterday, testnsd1 also died, which took the whole cluster down. So now resolving this has become higher priority? ;-) I took two other boxes and set them up as testnsd1 and 3, respectively. I?ve done a ?mmsdrrestore -p testnsd2 -R /usr/bin/scp? on both of them. I?ve also done a "mmccr setup -F? and copied the ccr.disks and ccr.nodes files from testnsd2 to them. And I?ve copied /var/mmfs/gen/mmsdrfs from testnsd2 to testnsd1 and 3. In case it?s not obvious from the above, networking is fine ? ssh without a password between those 3 boxes is fine. However, when I try to startup GPFS ? or run any GPFS command I get: /root root at testnsd2# mmstartup -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous error messages to determine cause. /root root at testnsd2# I?ve got to run to a meeting right now, so I hope I?m not leaving out any crucial details here ? does anyone have an idea what I need to do? Thanks? ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Sep 20 00:39:37 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 19 Sep 2017 23:39:37 +0000 Subject: [gpfsug-discuss] CCR cluster down for the count? Message-ID: OK ? I?ve run across this before, and it?s because of a bug (as I recall) having to do with CCR and quorum. What I think you can do is set the cluster to non-ccr (mmchcluster ?ccr-disable) with all the nodes down, bring it back up and then re-enable ccr. I?ll see if I can find this in one of the recent 4.2 release nodes. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of "Buterbaugh, Kevin L" Reply-To: gpfsug main discussion list Date: Tuesday, September 19, 2017 at 4:03 PM To: gpfsug main discussion list Subject: [EXTERNAL] [gpfsug-discuss] CCR cluster down for the count? Hi All, We have a small test cluster that is CCR enabled. It only had/has 3 NSD servers (testnsd1, 2, and 3) and maybe 3-6 clients. testnsd3 died a while back. I did nothing about it at the time because it was due to be life-cycled as soon as I finished a couple of higher priority projects. Yesterday, testnsd1 also died, which took the whole cluster down. So now resolving this has become higher priority? ;-) I took two other boxes and set them up as testnsd1 and 3, respectively. I?ve done a ?mmsdrrestore -p testnsd2 -R /usr/bin/scp? on both of them. I?ve also done a "mmccr setup -F? and copied the ccr.disks and ccr.nodes files from testnsd2 to them. And I?ve copied /var/mmfs/gen/mmsdrfs from testnsd2 to testnsd1 and 3. In case it?s not obvious from the above, networking is fine ? ssh without a password between those 3 boxes is fine. However, when I try to startup GPFS ? or run any GPFS command I get: /root root at testnsd2# mmstartup -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous error messages to determine cause. /root root at testnsd2# I?ve got to run to a meeting right now, so I hope I?m not leaving out any crucial details here ? does anyone have an idea what I need to do? Thanks? ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bevans at pixitmedia.com Wed Sep 20 02:21:36 2017 From: bevans at pixitmedia.com (Barry Evans) Date: Tue, 19 Sep 2017 18:21:36 -0700 Subject: [gpfsug-discuss] RoCE not playing ball Message-ID: Hi All, Weirdness with a RoCE interface - verbs is not playing ball and is complaining about the inet6 address not matching up: 2017-09-02_07:46:01.376+0100: [I] VERBS RDMA starting with verbsRdmaCm=yes verbsRdmaSend=no verbsRdmaUseMultiCqThreads=yes verbsRdmaUseCompVectors=yes 2017-09-02_07:46:01.377+0100: [I] VERBS RDMA library librdmacm.so (version >= 1.1) loaded and initialized. 2017-09-02_07:46:01.377+0100: [I] VERBS RDMA verbsRdmasPerNode reduced from 1000 to 514 to match (nsdMaxWorkerThreads 512 + (nspdThreadsPerQueue 2 * nspdQueues 1)). 2017-09-02_07:46:01.382+0100: [I] VERBS RDMA discover mlx4_1 port 1 transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet 0xFE80000000000000 id 0x268A07FFFEF981C0 state ACTIVE 2017-09-02_07:46:01.383+0100: [I] VERBS RDMA discover mlx4_1 port 1 transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[1] subnet 0x0000000000000000 id 0x0000FFFFAC106404 state ACTIVE 2017-09-02_07:46:01.384+0100: [I] VERBS RDMA discover mlx4_1 port 2 transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet 0xFE80000000000000 id 0x248A070001F981E1 state DOWN 2017-09-02_07:46:01.385+0100: [I] VERBS RDMA discover mlx4_0 port 1 transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet 0xFE80000000000000 id 0x268A07FFFEF981C0 state ACTIVE 2017-09-02_07:46:01.385+0100: [I] VERBS RDMA discover mlx4_0 port 1 transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[1] subnet 0x0000000000000000 id 0x0000FFFFAC106404 state ACTIVE 2017-09-02_07:46:01.386+0100: [I] VERBS RDMA discover mlx4_0 port 2 transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet 0xFE80000000000000 id 0x248A070001F981C1 state ACTIVE 2017-09-02_07:46:01.386+0100: [I] VERBS RDMA discover mlx4_0 port 2 transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[1] subnet 0x0000000000000000 id 0x0000FFFF0AC20011 state ACTIVE 2017-09-02_07:46:01.387+0100: [I] VERBS RDMA parse verbsPorts mlx4_0/1 2017-09-02_07:46:01.390+0100: [W] VERBS RDMA parse error verbsPort mlx4_0/1 ignored due to interface not found for port 1 of device mlx4_0 with GID c081f9feff078a26. Please check if the correct inet6 address for the corresponding IP network interface is set 2017-09-02_07:46:01.390+0100: [E] VERBS RDMA: rdma_get_cm_event err -1 2017-09-02_07:46:01.391+0100: [I] VERBS RDMA library librdmacm.so unloaded. 2017-09-02_07:46:01.391+0100: [E] VERBS RDMA failed to start, no valid verbsPorts defined. Anyone run into this before? I have another node imaged the *exact* same way and no dice. Have tried a variety of drivers, cards, etc, same result every time. Cheers, Barry -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Wed Sep 20 04:07:18 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Wed, 20 Sep 2017 11:07:18 +0800 Subject: [gpfsug-discuss] CCR cluster down for the count? In-Reply-To: References: Message-ID: Hi Kevin, Let's me try to understand the problem you have. What's the meaning of node died here. Are you mean that there are some hardware/OS issue which cannot be fixed and OS cannot be up anymore? I agree with Bob that you can have a try to disable CCR temporally, restore cluster configuration and enable it again. Such as: 1. Login to a node which has proper GPFS config, e.g NodeA 2. Shutdown daemon in all client cluster. 3. mmchcluster --ccr-disable -p NodeA 4. mmsdrrestore -a -p NodeA 5. mmauth genkey propagate -N testnsd1, testnsd3 6. mmchcluster --ccr-enable Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 09/20/2017 07:39 AM Subject: Re: [gpfsug-discuss] CCR cluster down for the count? Sent by: gpfsug-discuss-bounces at spectrumscale.org OK ? I?ve run across this before, and it?s because of a bug (as I recall) having to do with CCR and quorum. What I think you can do is set the cluster to non-ccr (mmchcluster ?ccr-disable) with all the nodes down, bring it back up and then re-enable ccr. I?ll see if I can find this in one of the recent 4.2 release nodes. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of "Buterbaugh, Kevin L" Reply-To: gpfsug main discussion list Date: Tuesday, September 19, 2017 at 4:03 PM To: gpfsug main discussion list Subject: [EXTERNAL] [gpfsug-discuss] CCR cluster down for the count? Hi All, We have a small test cluster that is CCR enabled. It only had/has 3 NSD servers (testnsd1, 2, and 3) and maybe 3-6 clients. testnsd3 died a while back. I did nothing about it at the time because it was due to be life-cycled as soon as I finished a couple of higher priority projects. Yesterday, testnsd1 also died, which took the whole cluster down. So now resolving this has become higher priority? ;-) I took two other boxes and set them up as testnsd1 and 3, respectively. I?ve done a ?mmsdrrestore -p testnsd2 -R /usr/bin/scp? on both of them. I?ve also done a "mmccr setup -F? and copied the ccr.disks and ccr.nodes files from testnsd2 to them. And I?ve copied /var/mmfs/gen/mmsdrfs from testnsd2 to testnsd1 and 3. In case it?s not obvious from the above, networking is fine ? ssh without a password between those 3 boxes is fine. However, when I try to startup GPFS ? or run any GPFS command I get: /root root at testnsd2# mmstartup -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous error messages to determine cause. /root root at testnsd2# I?ve got to run to a meeting right now, so I hope I?m not leaving out any crucial details here ? does anyone have an idea what I need to do? Thanks? ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=mBSa534LB4C2zN59ZsJSlginQqfcrutinpAPYNDqU_Y&s=YJEapknqzE2d9kwZzZuu6gEW0DzBoM-o94pXGEeCfuI&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From scale at us.ibm.com Wed Sep 20 04:33:16 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Wed, 20 Sep 2017 11:33:16 +0800 Subject: [gpfsug-discuss] Disk change problem in gss GNR In-Reply-To: References: Message-ID: Hi Atmane, In terms of this kind of disk management question, I would like to suggest to open a PMR to make IBM service help you. mmdelpdisk command would not need to reboot system to take effect. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: atmane To: "gpfsug-discuss at spectrumscale.org" Date: 09/14/2017 08:50 PM Subject: [gpfsug-discuss] Disk change problem in gss GNR Sent by: gpfsug-discuss-bounces at spectrumscale.org dear all, I change A Disk In Gss Storage Server mmchcarrier BB1RGL --release --pdisk 'e1d1s02' mmchcarrier BB1RGL --replace --pdisk 'e1d1s02' after replace disk Now I Have 2 Discs In My Gss the first disc was well changed name = "e1d1s02" the second disk still after I use this cmd mmdelpdisk BB1RGL --pdisk e1d1s02#004 -a the disk is still in use i need to reboot the system or ?? mmlspdisk all | less pdisk: replacementPriority = 1000 name = "e1d1s02" device = "/dev/sdik,/dev/sdih" recoveryGroup = "BB1RGL" declusteredArray = "DA1" state = "ok" capacity = 3000034656256 freeSpace = 1453846429696 fru = "00W1572" location = "SV30820390-1-2" WWN = "naa.5000C5008D783E37" server = "gss0-ib0" pdisk: replacementPriority = 1000 name = "e1d1s02#004" device = "" recoveryGroup = "BB1RGL" declusteredArray = "DA1" state = "missing/noPath/systemDrain/adminDrain/noRGD/noVCD" capacity = 3000034656256 freeSpace = 1599875317760 fru = "00W1572" location = "" WWN = "naa.5000C50056714E83" server = "gss0-ib0" -- -- Atmane Khiredine HPC System Admin | Office National de la M?t?orologie T?l : +213 21 50 73 93 Poste 303 | Fax : +213 21 50 79 40 | E-mail : a.khiredine at meteo.dz _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFbA&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=hQ86ctTaI7i14NrB-58_SzqSWnCR8p6b5bFxtzNcSbk&s=mthjH7ebhnNlSJl71hFjF4wZU0iygm3I9wH_Bu7_3Ds&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From olaf.weiser at de.ibm.com Wed Sep 20 06:00:49 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Wed, 20 Sep 2017 07:00:49 +0200 Subject: [gpfsug-discuss] RoCE not playing ball In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From jonathon.anderson at colorado.edu Wed Sep 20 06:13:13 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Wed, 20 Sep 2017 05:13:13 +0000 Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication In-Reply-To: References: <27469.1500914134@turing-police.cc.vt.edu> , Message-ID: Returning to this thread cause I'm having the same issue as Ilan, above. I'm working on setting up CES in our environment after finally getting a blocking bugfix applied. I'm making it further now, but I'm getting an error when I try to create my export: --- [root at sgate2 ~]# mmnfs export add /gpfs/summit/scratch --client 'login*.rc.int.colorado.edu(rw,root_squash);dtn*.rc.int.colorado.edu(rw,root_squash)' mmcesfuncs.sh: Current authentication: none is invalid. This operation can not be completed without correct Authentication configuration. Configure authentication using: mmuserauth mmnfs export add: Command failed. Examine previous error messages to determine cause. --- When I try to configure mmuserauth, I get an error about not having SMB active; but I don't want to configure SMB, only NFS. --- [root at sgate2 ~]# /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. --- How can I configure NFS exports with mmnfs without having to enable SMB? ~jonathon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Varun Mittal3 Sent: Tuesday, July 25, 2017 9:44:24 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sorry a small typo: mmuserauth service create --data-access-method file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune [Inactive hide details for Varun Mittal3---26/07/2017 09:12:27 AM---Hi Did you try to run this command from a CES designated nod]Varun Mittal3---26/07/2017 09:12:27 AM---Hi Did you try to run this command from a CES designated node ? From: Varun Mittal3/India/IBM To: gpfsug main discussion list Date: 26/07/2017 09:12 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication ________________________________ Hi Did you try to run this command from a CES designated node ? If no, then try executing the command from a CES node: mmuserauth service create --data-access-type file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune [Inactive hide details for Ilan Schwarts ---25/07/2017 10:22:26 AM---Hi, While trying to add the userdefined auth, I receive err]Ilan Schwarts ---25/07/2017 10:22:26 AM---Hi, While trying to add the userdefined auth, I receive error that SMB From: Ilan Schwarts To: gpfsug main discussion list Date: 25/07/2017 10:22 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi, While trying to add the userdefined auth, I receive error that SMB service not enabled. I am currently working on a spectrum scale cluster, and i dont have the SMB package, I am waiting for it.. is there a way to export NFSv3 using the spectrum scale tools without SMB package ? [root at LH20-GPFS1 ~]# mmuserauth service create --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. I exported the NFS via /etc/exports and than ./exportfs -a .. It works fine, I was able to mount the gpfs export from another machine.. this was my work-around since the spectrum scale tools failed to export NFSv3 On Mon, Jul 24, 2017 at 7:35 PM, wrote: > On Mon, 24 Jul 2017 13:36:41 +0300, Ilan Schwarts said: >> Hi, >> I have gpfs with 2 Nodes (redhat). >> I am trying to create NFS share - So I would be able to mount and >> access it from another linux machine. > >> While trying to create NFS (I execute the following): >> [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "* >> Access_Type=RW,Protocols=3:4,Squash=no_root_squash)" > > You can get away with little to no authentication for NFSv3, but > not for NFSv4. Try with Protocols=3 only and > > mmuserauth service create --type userdefined > > that should get you Unix-y NFSv3 UID/GID support and "trust what the NFS > client tells you". This of course only works sanely if each NFS export is > only to a set of machines in the same administrative domain that manages their > UID/GIDs. Exporting to two sets of machines that don't coordinate their > UID/GID space is, of course, where hilarity and hijinks ensue.... > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- - Ilan Schwarts _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathon.anderson at colorado.edu Wed Sep 20 06:33:14 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Wed, 20 Sep 2017 05:33:14 +0000 Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication In-Reply-To: References: <27469.1500914134@turing-police.cc.vt.edu> , , Message-ID: I should have said, here are the package versions: [root at sgate1 ~]# rpm -qa | grep gpfs gpfs.gpl-4.2.2-3.noarch gpfs.docs-4.2.2-3.noarch gpfs.base-4.2.2-3.x86_64 gpfs.gplbin-3.10.0-514.26.2.el7.x86_64-4.2.2-3.x86_64 nfs-ganesha-gpfs-2.3.2-0.ibm32_2.el7.x86_64 gpfs.ext-4.2.2-3.x86_64 gpfs.msg.en_US-4.2.2-3.noarch gpfs.gskit-8.0.50-57.x86_64 gpfs.gplbin-3.10.0-327.36.3.el7.x86_64-4.2.2-3.x86_64 ________________________________________ From: Jonathon A Anderson Sent: Tuesday, September 19, 2017 11:13:13 PM To: gpfsug main discussion list Cc: varun.mittal at in.ibm.com; Mark.Bush at siriuscom.com Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Returning to this thread cause I'm having the same issue as Ilan, above. I'm working on setting up CES in our environment after finally getting a blocking bugfix applied. I'm making it further now, but I'm getting an error when I try to create my export: --- [root at sgate2 ~]# mmnfs export add /gpfs/summit/scratch --client 'login*.rc.int.colorado.edu(rw,root_squash);dtn*.rc.int.colorado.edu(rw,root_squash)' mmcesfuncs.sh: Current authentication: none is invalid. This operation can not be completed without correct Authentication configuration. Configure authentication using: mmuserauth mmnfs export add: Command failed. Examine previous error messages to determine cause. --- When I try to configure mmuserauth, I get an error about not having SMB active; but I don't want to configure SMB, only NFS. --- [root at sgate2 ~]# /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. --- How can I configure NFS exports with mmnfs without having to enable SMB? ~jonathon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Varun Mittal3 Sent: Tuesday, July 25, 2017 9:44:24 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sorry a small typo: mmuserauth service create --data-access-method file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune [Inactive hide details for Varun Mittal3---26/07/2017 09:12:27 AM---Hi Did you try to run this command from a CES designated nod]Varun Mittal3---26/07/2017 09:12:27 AM---Hi Did you try to run this command from a CES designated node ? From: Varun Mittal3/India/IBM To: gpfsug main discussion list Date: 26/07/2017 09:12 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication ________________________________ Hi Did you try to run this command from a CES designated node ? If no, then try executing the command from a CES node: mmuserauth service create --data-access-type file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune [Inactive hide details for Ilan Schwarts ---25/07/2017 10:22:26 AM---Hi, While trying to add the userdefined auth, I receive err]Ilan Schwarts ---25/07/2017 10:22:26 AM---Hi, While trying to add the userdefined auth, I receive error that SMB From: Ilan Schwarts To: gpfsug main discussion list Date: 25/07/2017 10:22 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi, While trying to add the userdefined auth, I receive error that SMB service not enabled. I am currently working on a spectrum scale cluster, and i dont have the SMB package, I am waiting for it.. is there a way to export NFSv3 using the spectrum scale tools without SMB package ? [root at LH20-GPFS1 ~]# mmuserauth service create --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. I exported the NFS via /etc/exports and than ./exportfs -a .. It works fine, I was able to mount the gpfs export from another machine.. this was my work-around since the spectrum scale tools failed to export NFSv3 On Mon, Jul 24, 2017 at 7:35 PM, wrote: > On Mon, 24 Jul 2017 13:36:41 +0300, Ilan Schwarts said: >> Hi, >> I have gpfs with 2 Nodes (redhat). >> I am trying to create NFS share - So I would be able to mount and >> access it from another linux machine. > >> While trying to create NFS (I execute the following): >> [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "* >> Access_Type=RW,Protocols=3:4,Squash=no_root_squash)" > > You can get away with little to no authentication for NFSv3, but > not for NFSv4. Try with Protocols=3 only and > > mmuserauth service create --type userdefined > > that should get you Unix-y NFSv3 UID/GID support and "trust what the NFS > client tells you". This of course only works sanely if each NFS export is > only to a set of machines in the same administrative domain that manages their > UID/GIDs. Exporting to two sets of machines that don't coordinate their > UID/GID space is, of course, where hilarity and hijinks ensue.... > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- - Ilan Schwarts _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From gangqiu at cn.ibm.com Wed Sep 20 06:58:15 2017 From: gangqiu at cn.ibm.com (Gang Qiu) Date: Wed, 20 Sep 2017 13:58:15 +0800 Subject: [gpfsug-discuss] RoCE not playing ball In-Reply-To: References: Message-ID: Do you set ip address for these adapters? Refer to the description of verbsRdmaCm in ?Command and Programming Reference': If RDMA CM is enabled for a node, the node will only be able to establish RDMA connections using RDMA CM to other nodes with verbsRdmaCm enabled. RDMA CM enablement requires IPoIB (IP over InfiniBand) with an active IP address for each port. Although IPv6 must be enabled, the GPFS implementation of RDMA CM does not currently support IPv6 addresses, so an IPv4 address must be used. Regards, Gang Qiu ********************************************************************************************** IBM China Systems & Technology Lab Tel: 86-10-82452193 Fax: 86-10-82452312 Moble: 132-6134-8284 Email: gangqiu at cn.ibm.com Address: Ring Bldg. No.28 Building, Zhong Guan Cun Software Park, No. 8 Dong Bei Wang West Road, ShangDi, Haidian District, Beijing 100193, P.R.China ??????????????8???????28???????????100193 ********************************************************************************************** From: "Olaf Weiser" To: gpfsug main discussion list Date: 09/20/2017 01:01 PM Subject: Re: [gpfsug-discuss] RoCE not playing ball Sent by: gpfsug-discuss-bounces at spectrumscale.org is ib_read_bw working ? just test it between the two nodes ... From: Barry Evans To: gpfsug main discussion list Date: 09/20/2017 03:21 AM Subject: [gpfsug-discuss] RoCE not playing ball Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, Weirdness with a RoCE interface - verbs is not playing ball and is complaining about the inet6 address not matching up: 2017-09-02_07:46:01.376+0100: [I] VERBS RDMA starting with verbsRdmaCm=yes verbsRdmaSend=no verbsRdmaUseMultiCqThreads=yes verbsRdmaUseCompVectors=yes 2017-09-02_07:46:01.377+0100: [I] VERBS RDMA library librdmacm.so (version >= 1.1) loaded and initialized. 2017-09-02_07:46:01.377+0100: [I] VERBS RDMA verbsRdmasPerNode reduced from 1000 to 514 to match (nsdMaxWorkerThreads 512 + (nspdThreadsPerQueue 2 * nspdQueues 1)). 2017-09-02_07:46:01.382+0100: [I] VERBS RDMA discover mlx4_1 port 1 transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet 0xFE80000000000000 id 0x268A07FFFEF981C0 state ACTIVE 2017-09-02_07:46:01.383+0100: [I] VERBS RDMA discover mlx4_1 port 1 transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[1] subnet 0x0000000000000000 id 0x0000FFFFAC106404 state ACTIVE 2017-09-02_07:46:01.384+0100: [I] VERBS RDMA discover mlx4_1 port 2 transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet 0xFE80000000000000 id 0x248A070001F981E1 state DOWN 2017-09-02_07:46:01.385+0100: [I] VERBS RDMA discover mlx4_0 port 1 transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet 0xFE80000000000000 id 0x268A07FFFEF981C0 state ACTIVE 2017-09-02_07:46:01.385+0100: [I] VERBS RDMA discover mlx4_0 port 1 transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[1] subnet 0x0000000000000000 id 0x0000FFFFAC106404 state ACTIVE 2017-09-02_07:46:01.386+0100: [I] VERBS RDMA discover mlx4_0 port 2 transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet 0xFE80000000000000 id 0x248A070001F981C1 state ACTIVE 2017-09-02_07:46:01.386+0100: [I] VERBS RDMA discover mlx4_0 port 2 transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[1] subnet 0x0000000000000000 id 0x0000FFFF0AC20011 state ACTIVE 2017-09-02_07:46:01.387+0100: [I] VERBS RDMA parse verbsPorts mlx4_0/1 2017-09-02_07:46:01.390+0100: [W] VERBS RDMA parse error verbsPort mlx4_0/1 ignored due to interface not found for port 1 of device mlx4_0 with GID c081f9feff078a26. Please check if the correct inet6 address for the corresponding IP network interface is set 2017-09-02_07:46:01.390+0100: [E] VERBS RDMA: rdma_get_cm_event err -1 2017-09-02_07:46:01.391+0100: [I] VERBS RDMA library librdmacm.so unloaded. 2017-09-02_07:46:01.391+0100: [E] VERBS RDMA failed to start, no valid verbsPorts defined. Anyone run into this before? I have another node imaged the *exact* same way and no dice. Have tried a variety of drivers, cards, etc, same result every time. Cheers, Barry This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email._______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=NCthMXTjizwdEVDBqoDwAfRswiFbdQVHRb4mzseFLEM&m=u155tVFn5u91gqIsTXSOSVvpbR7GQRPoVpviUDH73R0&s=63nY5ozD8mej1jefNBZjLGCkNOFD9-swr-lc7CRPbrM&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From tortay at cc.in2p3.fr Wed Sep 20 09:03:54 2017 From: tortay at cc.in2p3.fr (Loic Tortay) Date: Wed, 20 Sep 2017 10:03:54 +0200 Subject: [gpfsug-discuss] CCR cluster down for the count? In-Reply-To: <26ABC473-387D-4D58-9059-518E455724A9@vanderbilt.edu> References: <26ABC473-387D-4D58-9059-518E455724A9@vanderbilt.edu> Message-ID: <853ffcf7-7900-457b-0d8a-2c63886ed245@cc.in2p3.fr> On 19/09/2017 23:02, Buterbaugh, Kevin L wrote: > Hi All, > > We have a small test cluster that is CCR enabled. It only had/has 3 NSD servers (testnsd1, 2, and 3) and maybe 3-6 clients. testnsd3 died a while back. I did nothing about it at the time because it was due to be life-cycled as soon as I finished a couple of higher priority projects. > > Yesterday, testnsd1 also died, which took the whole cluster down. So now resolving this has become higher priority? ;-) > > I took two other boxes and set them up as testnsd1 and 3, respectively. I?ve done a ?mmsdrrestore -p testnsd2 -R /usr/bin/scp? on both of them. I?ve also done a "mmccr setup -F? and copied the ccr.disks and ccr.nodes files from testnsd2 to them. And I?ve copied /var/mmfs/gen/mmsdrfs from testnsd2 to testnsd1 and 3. In case it?s not obvious from the above, networking is fine ? ssh without a password between those 3 boxes is fine. > > However, when I try to startup GPFS ? or run any GPFS command I get: > > /root > root at testnsd2# mmstartup -a > get file failed: Not enough CCR quorum nodes available (err 809) > gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 > mmstartup: Command failed. Examine previous error messages to determine cause. > /root > root at testnsd2# > > I?ve got to run to a meeting right now, so I hope I?m not leaving out any crucial details here ? does anyone have an idea what I need to do? Thanks? > Hello, I have had the same issue multiple times. The "trick" is to execute "/usr/lpp/mmfs/bin/mmcommon startCcrMonitor" on a majority of quorum nodes (once they have the correct configuration files) to be able to start the cluster. I noticed a call to the above command in the "gpfs.gplbin" spec file in the "%postun" section (when doing RPM upgrades, if I'm not mistaken). . Lo?c. -- | Lo?c Tortay - IN2P3 Computing Centre | From r.sobey at imperial.ac.uk Wed Sep 20 09:23:37 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 20 Sep 2017 08:23:37 +0000 Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication In-Reply-To: References: <27469.1500914134@turing-police.cc.vt.edu> , Message-ID: This sounded familiar to a problem I had to do with SMB and NFS. I've looked, and it's a different problem, but at the time I had this response. "That would be the case when Active Directory is configured for authentication. In that case the SMB service includes two aspects: One is the actual SMB file server, and the second one is the service for the Active Directory integration. Since NFS depends on authentication and id mapping services, it requires SMB to be running." I suspect the last paragraph is relevant in your case. HTH Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jonathon A Anderson Sent: 20 September 2017 06:13 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Returning to this thread cause I'm having the same issue as Ilan, above. I'm working on setting up CES in our environment after finally getting a blocking bugfix applied. I'm making it further now, but I'm getting an error when I try to create my export: --- [root at sgate2 ~]# mmnfs export add /gpfs/summit/scratch --client 'login*.rc.int.colorado.edu(rw,root_squash);dtn*.rc.int.colorado.edu(rw,root_squash)' mmcesfuncs.sh: Current authentication: none is invalid. This operation can not be completed without correct Authentication configuration. Configure authentication using: mmuserauth mmnfs export add: Command failed. Examine previous error messages to determine cause. --- When I try to configure mmuserauth, I get an error about not having SMB active; but I don't want to configure SMB, only NFS. --- [root at sgate2 ~]# /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. --- How can I configure NFS exports with mmnfs without having to enable SMB? ~jonathon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Varun Mittal3 Sent: Tuesday, July 25, 2017 9:44:24 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sorry a small typo: mmuserauth service create --data-access-method file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune [Inactive hide details for Varun Mittal3---26/07/2017 09:12:27 AM---Hi Did you try to run this command from a CES designated nod]Varun Mittal3---26/07/2017 09:12:27 AM---Hi Did you try to run this command from a CES designated node ? From: Varun Mittal3/India/IBM To: gpfsug main discussion list Date: 26/07/2017 09:12 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication ________________________________ Hi Did you try to run this command from a CES designated node ? If no, then try executing the command from a CES node: mmuserauth service create --data-access-type file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune [Inactive hide details for Ilan Schwarts ---25/07/2017 10:22:26 AM---Hi, While trying to add the userdefined auth, I receive err]Ilan Schwarts ---25/07/2017 10:22:26 AM---Hi, While trying to add the userdefined auth, I receive error that SMB From: Ilan Schwarts To: gpfsug main discussion list Date: 25/07/2017 10:22 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi, While trying to add the userdefined auth, I receive error that SMB service not enabled. I am currently working on a spectrum scale cluster, and i dont have the SMB package, I am waiting for it.. is there a way to export NFSv3 using the spectrum scale tools without SMB package ? [root at LH20-GPFS1 ~]# mmuserauth service create --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. I exported the NFS via /etc/exports and than ./exportfs -a .. It works fine, I was able to mount the gpfs export from another machine.. this was my work-around since the spectrum scale tools failed to export NFSv3 On Mon, Jul 24, 2017 at 7:35 PM, wrote: > On Mon, 24 Jul 2017 13:36:41 +0300, Ilan Schwarts said: >> Hi, >> I have gpfs with 2 Nodes (redhat). >> I am trying to create NFS share - So I would be able to mount and >> access it from another linux machine. > >> While trying to create NFS (I execute the following): >> [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "* >> Access_Type=RW,Protocols=3:4,Squash=no_root_squash)" > > You can get away with little to no authentication for NFSv3, but > not for NFSv4. Try with Protocols=3 only and > > mmuserauth service create --type userdefined > > that should get you Unix-y NFSv3 UID/GID support and "trust what the NFS > client tells you". This of course only works sanely if each NFS export is > only to a set of machines in the same administrative domain that manages their > UID/GIDs. Exporting to two sets of machines that don't coordinate their > UID/GID space is, of course, where hilarity and hijinks ensue.... > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- - Ilan Schwarts _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From douglasof at us.ibm.com Wed Sep 20 09:28:44 2017 From: douglasof at us.ibm.com (Douglas O'flaherty) Date: Wed, 20 Sep 2017 08:28:44 +0000 Subject: [gpfsug-discuss] User Meeting & SPXXL in NYC Message-ID: Reminder that the SPXXL day on IBM Spectrum Scale in New York is open to all. It is Thursday the 28th. There is also a Power day on Wednesday. For more information http://hpcxxl.org/summer-2017-meeting-september-24-29-new-york-city/ Doug Mobile -------------- next part -------------- An HTML attachment was scrubbed... URL: From ckrafft at de.ibm.com Wed Sep 20 11:47:35 2017 From: ckrafft at de.ibm.com (Christoph Krafft) Date: Wed, 20 Sep 2017 12:47:35 +0200 Subject: [gpfsug-discuss] WANTED: Official support statement using Spectrum Scale 4.2.x with Oracle DB v12 Message-ID: Hi folks, is anyone aware if there is now an official support statement for Spectrum Scale 4.2.x? As far as my understanding goes - we currently have an "older" official support statement for v4.1 with Oracle. Many thanks up-front for any useful hints ... :) Mit freundlichen Gr??en / Sincerely Christoph Krafft Client Technical Specialist - Power Systems, IBM Systems Certified IT Specialist @ The Open Group Phone: +49 (0) 7034 643 2171 IBM Deutschland GmbH Mobile: +49 (0) 160 97 81 86 12 Am Weiher 24 Email: ckrafft at de.ibm.com 65451 Kelsterbach Germany IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Norbert Janzen, Stefan Lutz, Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 15225079.gif Type: image/gif Size: 1851 bytes Desc: not available URL: From Kevin.Buterbaugh at Vanderbilt.Edu Wed Sep 20 14:55:28 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 20 Sep 2017 13:55:28 +0000 Subject: [gpfsug-discuss] CCR cluster down for the count? In-Reply-To: References: Message-ID: Hi All, testnsd1 and testnsd3 both had hardware issues (power supply and internal HD respectively). Given that they were 12 year old boxes, we decided to replace them with other boxes that are a mere 7 years old ? keep in mind that this is a test cluster. Disabling CCR does not work, even with the undocumented ??force? option: /var/mmfs/gen root at testnsd2# mmchcluster --ccr-disable -p testnsd2 -s testnsd1 --force mmchcluster: Unable to obtain the GPFS configuration file lock. mmchcluster: GPFS was unable to obtain a lock from node testnsd1.vampire. mmchcluster: Processing continues without lock protection. The authenticity of host 'testnsd3.vampire (10.0.6.215)' can't be established. ECDSA key fingerprint is SHA256:Ky1pkjsC/kvt4RA8PJuEh/W3vcxCJZplr2m1XHr+UwI. ECDSA key fingerprint is MD5:55:59:a0:2a:6e:a1:00:58:85:3d:ac:86:0e:cd:2a:8a. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'testnsd1.vampire (10.0.6.213)' can't be established. ECDSA key fingerprint is SHA256:WPiTtyuyzhuv+lRRpgDjLuHpyHyk/W3+c5N9SabWvnE. ECDSA key fingerprint is MD5:26:26:2a:bf:e4:cb:1d:a8:27:35:96:ef:b5:96:e0:29. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp609.vampire (10.0.21.9)' can't be established. ECDSA key fingerprint is SHA256:/gX6eSp/shsRboVFcUFcNCtGSfbBIWQZ/CWjA6gb17Q. ECDSA key fingerprint is MD5:ca:4d:58:8c:91:28:25:7b:5b:b1:0d:a3:72:a3:00:bb. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp608.vampire (10.0.21.8)' can't be established. ECDSA key fingerprint is SHA256:tvtNWN9b7/Qknb/Am8x7FzyMngi6R3f5SHBqATNtLzw. ECDSA key fingerprint is MD5:fc:4e:87:fb:09:82:cd:67:b0:7d:7f:c7:4b:83:b9:6c. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp612.vampire (10.0.21.12)' can't be established. ECDSA key fingerprint is SHA256:zKXqPt8rIMZWSAYavKEuaAVIm31OGVovoWVU+dBTRPM. ECDSA key fingerprint is MD5:72:4d:fb:22:4e:b3:0e:04:37:be:16:74:ae:ea:05:6c. Are you sure you want to continue connecting (yes/no)? root at vmp610.vampire's password: testnsd3.vampire: Host key verification failed. mmdsh: testnsd3.vampire remote shell process had return code 255. testnsd1.vampire: Host key verification failed. mmdsh: testnsd1.vampire remote shell process had return code 255. vmp609.vampire: Host key verification failed. mmdsh: vmp609.vampire remote shell process had return code 255. vmp608.vampire: Host key verification failed. mmdsh: vmp608.vampire remote shell process had return code 255. vmp612.vampire: Host key verification failed. mmdsh: vmp612.vampire remote shell process had return code 255. root at vmp610.vampire's password: vmp610.vampire: Permission denied, please try again. root at vmp610.vampire's password: vmp610.vampire: Permission denied, please try again. vmp610.vampire: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password). mmdsh: vmp610.vampire remote shell process had return code 255. Verifying GPFS is stopped on all nodes ... The authenticity of host 'testnsd3.vampire (10.0.6.215)' can't be established. ECDSA key fingerprint is SHA256:Ky1pkjsC/kvt4RA8PJuEh/W3vcxCJZplr2m1XHr+UwI. ECDSA key fingerprint is MD5:55:59:a0:2a:6e:a1:00:58:85:3d:ac:86:0e:cd:2a:8a. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp612.vampire (10.0.21.12)' can't be established. ECDSA key fingerprint is SHA256:zKXqPt8rIMZWSAYavKEuaAVIm31OGVovoWVU+dBTRPM. ECDSA key fingerprint is MD5:72:4d:fb:22:4e:b3:0e:04:37:be:16:74:ae:ea:05:6c. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp608.vampire (10.0.21.8)' can't be established. ECDSA key fingerprint is SHA256:tvtNWN9b7/Qknb/Am8x7FzyMngi6R3f5SHBqATNtLzw. ECDSA key fingerprint is MD5:fc:4e:87:fb:09:82:cd:67:b0:7d:7f:c7:4b:83:b9:6c. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp609.vampire (10.0.21.9)' can't be established. ECDSA key fingerprint is SHA256:/gX6eSp/shsRboVFcUFcNCtGSfbBIWQZ/CWjA6gb17Q. ECDSA key fingerprint is MD5:ca:4d:58:8c:91:28:25:7b:5b:b1:0d:a3:72:a3:00:bb. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'testnsd1.vampire (10.0.6.213)' can't be established. ECDSA key fingerprint is SHA256:WPiTtyuyzhuv+lRRpgDjLuHpyHyk/W3+c5N9SabWvnE. ECDSA key fingerprint is MD5:26:26:2a:bf:e4:cb:1d:a8:27:35:96:ef:b5:96:e0:29. Are you sure you want to continue connecting (yes/no)? root at vmp610.vampire's password: root at vmp610.vampire's password: root at vmp610.vampire's password: testnsd3.vampire: Host key verification failed. mmdsh: testnsd3.vampire remote shell process had return code 255. vmp612.vampire: Host key verification failed. mmdsh: vmp612.vampire remote shell process had return code 255. vmp608.vampire: Host key verification failed. mmdsh: vmp608.vampire remote shell process had return code 255. vmp609.vampire: Host key verification failed. mmdsh: vmp609.vampire remote shell process had return code 255. testnsd1.vampire: Host key verification failed. mmdsh: testnsd1.vampire remote shell process had return code 255. vmp610.vampire: Permission denied, please try again. vmp610.vampire: Permission denied, please try again. vmp610.vampire: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password). mmdsh: vmp610.vampire remote shell process had return code 255. mmchcluster: Command failed. Examine previous error messages to determine cause. /var/mmfs/gen root at testnsd2# I believe that part of the problem may be that there are 4 client nodes that were removed from the cluster without removing them from the cluster (done by another SysAdmin who was in a hurry to repurpose those machines). They?re up and pingable but not reachable by GPFS anymore, which I?m pretty sure is making things worse. Nor does Loic?s suggestion of running mmcommon work (but thanks for the suggestion!) ? actually the mmcommon part worked, but a subsequent attempt to start the cluster up failed: /var/mmfs/gen root at testnsd2# mmstartup -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous error messages to determine cause. /var/mmfs/gen root at testnsd2# Thanks. Kevin On Sep 19, 2017, at 10:07 PM, IBM Spectrum Scale > wrote: Hi Kevin, Let's me try to understand the problem you have. What's the meaning of node died here. Are you mean that there are some hardware/OS issue which cannot be fixed and OS cannot be up anymore? I agree with Bob that you can have a try to disable CCR temporally, restore cluster configuration and enable it again. Such as: 1. Login to a node which has proper GPFS config, e.g NodeA 2. Shutdown daemon in all client cluster. 3. mmchcluster --ccr-disable -p NodeA 4. mmsdrrestore -a -p NodeA 5. mmauth genkey propagate -N testnsd1, testnsd3 6. mmchcluster --ccr-enable Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. "Oesterlin, Robert" ---09/20/2017 07:39:55 AM---OK ? I?ve run across this before, and it?s because of a bug (as I recall) having to do with CCR and From: "Oesterlin, Robert" > To: gpfsug main discussion list > Date: 09/20/2017 07:39 AM Subject: Re: [gpfsug-discuss] CCR cluster down for the count? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ OK ? I?ve run across this before, and it?s because of a bug (as I recall) having to do with CCR and quorum. What I think you can do is set the cluster to non-ccr (mmchcluster ?ccr-disable) with all the nodes down, bring it back up and then re-enable ccr. I?ll see if I can find this in one of the recent 4.2 release nodes. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: > on behalf of "Buterbaugh, Kevin L" > Reply-To: gpfsug main discussion list > Date: Tuesday, September 19, 2017 at 4:03 PM To: gpfsug main discussion list > Subject: [EXTERNAL] [gpfsug-discuss] CCR cluster down for the count? Hi All, We have a small test cluster that is CCR enabled. It only had/has 3 NSD servers (testnsd1, 2, and 3) and maybe 3-6 clients. testnsd3 died a while back. I did nothing about it at the time because it was due to be life-cycled as soon as I finished a couple of higher priority projects. Yesterday, testnsd1 also died, which took the whole cluster down. So now resolving this has become higher priority? ;-) I took two other boxes and set them up as testnsd1 and 3, respectively. I?ve done a ?mmsdrrestore -p testnsd2 -R /usr/bin/scp? on both of them. I?ve also done a "mmccr setup -F? and copied the ccr.disks and ccr.nodes files from testnsd2 to them. And I?ve copied /var/mmfs/gen/mmsdrfs from testnsd2 to testnsd1 and 3. In case it?s not obvious from the above, networking is fine ? ssh without a password between those 3 boxes is fine. However, when I try to startup GPFS ? or run any GPFS command I get: /root root at testnsd2# mmstartup -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous error messages to determine cause. /root root at testnsd2# I?ve got to run to a meeting right now, so I hope I?m not leaving out any crucial details here ? does anyone have an idea what I need to do? Thanks? ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=mBSa534LB4C2zN59ZsJSlginQqfcrutinpAPYNDqU_Y&s=YJEapknqzE2d9kwZzZuu6gEW0DzBoM-o94pXGEeCfuI&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C494f0469ec084568b39608d4ffd4b8c2%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636414736486816768&sdata=kBvEL7Kp2JMGuLIL4NX3UV7h3emaayQSbHr8O1F2CXc%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bevans at pixitmedia.com Wed Sep 20 15:17:34 2017 From: bevans at pixitmedia.com (Barry Evans) Date: Wed, 20 Sep 2017 07:17:34 -0700 Subject: [gpfsug-discuss] RoCE not playing ball In-Reply-To: References: Message-ID: Yep, IP's set ok. We did try with ipv6 off to see what would happen, then turned it back on again. There are ipv6 addresses on the cards, but ipv4 is the only thing actually being used. On Tue, Sep 19, 2017 at 10:58 PM, Gang Qiu wrote: > > > > Do you set ip address for these adapters? > > Refer to the description of verbsRdmaCm in ?Command and Programming > Reference': > > If RDMA CM is enabled for a node, the node will only be able to establish > RDMA connections > using RDMA CM to other nodes with *verbsRdmaCm *enabled. RDMA CM > enablement requires > IPoIB (IP over InfiniBand) with an active IP address for each port. > Although IPv6 must be > enabled, the GPFS implementation of RDMA CM does not currently support > IPv6 addresses, so > an IPv4 address must be used. > > > > Regards, > Gang Qiu > > ************************************************************ > ********************************** > IBM China Systems & Technology Lab > Tel: 86-10-82452193 > Fax: 86-10-82452312 > Moble: 132-6134-8284 > Email: gangqiu at cn.ibm.com > Address: Ring Bldg. No.28 Building, Zhong Guan Cun Software Park, No. 8 > Dong Bei Wang West Road, ShangDi, Haidian District, Beijing 100193, > P.R.China > ??????????????8???????28???????????100193 > ************************************************************ > ********************************** > > > > From: "Olaf Weiser" > To: gpfsug main discussion list > Date: 09/20/2017 01:01 PM > Subject: Re: [gpfsug-discuss] RoCE not playing ball > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > is ib_read_bw working ? > just test it between the two nodes ... > > > > > From: Barry Evans > To: gpfsug main discussion list > Date: 09/20/2017 03:21 AM > Subject: [gpfsug-discuss] RoCE not playing ball > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi All, > > Weirdness with a RoCE interface - verbs is not playing ball and is > complaining about the inet6 address not matching up: > > 2017-09-02_07:46:01.376+0100: [I] VERBS RDMA starting with verbsRdmaCm=yes > verbsRdmaSend=no verbsRdmaUseMultiCqThreads=yes verbsRdmaUseCompVectors=yes > 2017-09-02_07:46:01.377+0100: [I] VERBS RDMA library librdmacm.so (version > >= 1.1) loaded and initialized. > 2017-09-02_07:46:01.377+0100: [I] VERBS RDMA verbsRdmasPerNode reduced > from 1000 to 514 to match (nsdMaxWorkerThreads 512 + (nspdThreadsPerQueue 2 > * nspdQueues 1)). > 2017-09-02_07:46:01.382+0100: [I] VERBS RDMA discover mlx4_1 port 1 > transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet > 0xFE80000000000000 id 0x268A07FFFEF981C0 state ACTIVE > 2017-09-02_07:46:01.383+0100: [I] VERBS RDMA discover mlx4_1 port 1 > transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[1] subnet > 0x0000000000000000 id 0x0000FFFFAC106404 state ACTIVE > 2017-09-02_07:46:01.384+0100: [I] VERBS RDMA discover mlx4_1 port 2 > transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet > 0xFE80000000000000 id 0x248A070001F981E1 state DOWN > 2017-09-02_07:46:01.385+0100: [I] VERBS RDMA discover mlx4_0 port 1 > transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet > 0xFE80000000000000 id 0x268A07FFFEF981C0 state ACTIVE > 2017-09-02_07:46:01.385+0100: [I] VERBS RDMA discover mlx4_0 port 1 > transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[1] subnet > 0x0000000000000000 id 0x0000FFFFAC106404 state ACTIVE > 2017-09-02_07:46:01.386+0100: [I] VERBS RDMA discover mlx4_0 port 2 > transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet > 0xFE80000000000000 id 0x248A070001F981C1 state ACTIVE > 2017-09-02_07:46:01.386+0100: [I] VERBS RDMA discover mlx4_0 port 2 > transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[1] subnet > 0x0000000000000000 id 0x0000FFFF0AC20011 state ACTIVE > 2017-09-02_07:46:01.387+0100: [I] VERBS RDMA parse verbsPorts mlx4_0/1 > 2017-09-02_07:46:01.390+0100: [W] VERBS RDMA parse error verbsPort > mlx4_0/1 ignored due to interface not found for port 1 of device mlx4_0 > with GID c081f9feff078a26. Please check if the correct inet6 address for > the corresponding IP network interface is set > 2017-09-02_07:46:01.390+0100: [E] VERBS RDMA: rdma_get_cm_event err -1 > 2017-09-02_07:46:01.391+0100: [I] VERBS RDMA library librdmacm.so unloaded. > 2017-09-02_07:46:01.391+0100: [E] VERBS RDMA failed to start, no valid > verbsPorts defined. > > > Anyone run into this before? I have another node imaged the *exact* same > way and no dice. Have tried a variety of drivers, cards, etc, same result > every time. > > Cheers, > Barry > > > > > > > > This email is confidential in that it is intended for the exclusive > attention of the addressee(s) indicated. If you are not the intended > recipient, this email should not be read or disclosed to any other person. > Please notify the sender immediately and delete this email from your > computer system. Any opinions expressed are not necessarily those of the > company from which this email was sent and, whilst to the best of our > knowledge no viruses or defects exist, no responsibility can be accepted > for any loss or damage arising from its receipt or subsequent use of this > email._______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug. > org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r= > NCthMXTjizwdEVDBqoDwAfRswiFbdQVHRb4mzseFLEM&m= > u155tVFn5u91gqIsTXSOSVvpbR7GQRPoVpviUDH73R0&s= > 63nY5ozD8mej1jefNBZjLGCkNOFD9-swr-lc7CRPbrM&e= > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bevans at pixitmedia.com Wed Sep 20 15:23:21 2017 From: bevans at pixitmedia.com (Barry Evans) Date: Wed, 20 Sep 2017 07:23:21 -0700 Subject: [gpfsug-discuss] RoCE not playing ball In-Reply-To: References: Message-ID: It has worked, yes, and while the issue has been present. At the moment it's not working, but I'm not entirely surprised with the amount it's been poked at. Cheers, Barry On Tue, Sep 19, 2017 at 10:00 PM, Olaf Weiser wrote: > is ib_read_bw working ? > just test it between the two nodes ... > > > > > From: Barry Evans > To: gpfsug main discussion list > Date: 09/20/2017 03:21 AM > Subject: [gpfsug-discuss] RoCE not playing ball > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi All, > > Weirdness with a RoCE interface - verbs is not playing ball and is > complaining about the inet6 address not matching up: > > 2017-09-02_07:46:01.376+0100: [I] VERBS RDMA starting with verbsRdmaCm=yes > verbsRdmaSend=no verbsRdmaUseMultiCqThreads=yes verbsRdmaUseCompVectors=yes > 2017-09-02_07:46:01.377+0100: [I] VERBS RDMA library librdmacm.so (version > >= 1.1) loaded and initialized. > 2017-09-02_07:46:01.377+0100: [I] VERBS RDMA verbsRdmasPerNode reduced > from 1000 to 514 to match (nsdMaxWorkerThreads 512 + (nspdThreadsPerQueue 2 > * nspdQueues 1)). > 2017-09-02_07:46:01.382+0100: [I] VERBS RDMA discover mlx4_1 port 1 > transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet > 0xFE80000000000000 id 0x268A07FFFEF981C0 state ACTIVE > 2017-09-02_07:46:01.383+0100: [I] VERBS RDMA discover mlx4_1 port 1 > transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[1] subnet > 0x0000000000000000 id 0x0000FFFFAC106404 state ACTIVE > 2017-09-02_07:46:01.384+0100: [I] VERBS RDMA discover mlx4_1 port 2 > transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet > 0xFE80000000000000 id 0x248A070001F981E1 state DOWN > 2017-09-02_07:46:01.385+0100: [I] VERBS RDMA discover mlx4_0 port 1 > transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet > 0xFE80000000000000 id 0x268A07FFFEF981C0 state ACTIVE > 2017-09-02_07:46:01.385+0100: [I] VERBS RDMA discover mlx4_0 port 1 > transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[1] subnet > 0x0000000000000000 id 0x0000FFFFAC106404 state ACTIVE > 2017-09-02_07:46:01.386+0100: [I] VERBS RDMA discover mlx4_0 port 2 > transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet > 0xFE80000000000000 id 0x248A070001F981C1 state ACTIVE > 2017-09-02_07:46:01.386+0100: [I] VERBS RDMA discover mlx4_0 port 2 > transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[1] subnet > 0x0000000000000000 id 0x0000FFFF0AC20011 state ACTIVE > 2017-09-02_07:46:01.387+0100: [I] VERBS RDMA parse verbsPorts mlx4_0/1 > 2017-09-02_07:46:01.390+0100: [W] VERBS RDMA parse error verbsPort > mlx4_0/1 ignored due to interface not found for port 1 of device mlx4_0 > with GID c081f9feff078a26. Please check if the correct inet6 address for > the corresponding IP network interface is set > 2017-09-02_07:46:01.390+0100: [E] VERBS RDMA: rdma_get_cm_event err -1 > 2017-09-02_07:46:01.391+0100: [I] VERBS RDMA library librdmacm.so unloaded. > 2017-09-02_07:46:01.391+0100: [E] VERBS RDMA failed to start, no valid > verbsPorts defined. > > > Anyone run into this before? I have another node imaged the *exact* same > way and no dice. Have tried a variety of drivers, cards, etc, same result > every time. > > Cheers, > Barry > > > > > > > This email is confidential in that it is intended for the exclusive > attention of the addressee(s) indicated. If you are not the intended > recipient, this email should not be read or disclosed to any other person. > Please notify the sender immediately and delete this email from your > computer system. Any opinions expressed are not necessarily those of the > company from which this email was sent and, whilst to the best of our > knowledge no viruses or defects exist, no responsibility can be accepted > for any loss or damage arising from its receipt or subsequent use of this > email._______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Wed Sep 20 17:00:15 2017 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Wed, 20 Sep 2017 09:00:15 -0700 Subject: [gpfsug-discuss] User Meeting & SPXXL in NYC In-Reply-To: References: Message-ID: Thanks Doug. If you plan to go, *do register*. GPFS Day is free, but we need to know how many will attend. Register using the link on the HPCXXL event page below. Cheers, Kristy > On Sep 20, 2017, at 1:28 AM, Douglas O'flaherty wrote: > > > Reminder that the SPXXL day on IBM Spectrum Scale in New York is open to all. It is Thursday the 28th. There is also a Power day on Wednesday. > > > For more information > http://hpcxxl.org/summer-2017-meeting-september-24-29-new-york-city/ > > Doug > > Mobile > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Wed Sep 20 17:27:48 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 20 Sep 2017 16:27:48 +0000 Subject: [gpfsug-discuss] CCR cluster down for the count? In-Reply-To: <20170920114844.6bf9f27b@osc.edu> References: <20170920114844.6bf9f27b@osc.edu> Message-ID: <28D10363-A8F3-439B-81DB-EB0E4E750FFD@vanderbilt.edu> Hi Ed, Thanks for the suggestion ? that?s basically what I had done yesterday after Googling and getting a hit or two on the IBM DeveloperWorks site. I?m including some output below which seems to show that I?ve got everything set up but it?s still not working. Am I missing something? We don?t use CCR on our production cluster (and this experience doesn?t make me eager to do so!), so I?m not that familiar with it... Kevin /var/mmfs/gen root at testnsd2# mmdsh -F /tmp/cluster.hostnames "ps -ef | grep mmccr | grep -v grep" | sort testdellnode1: root 2583 1 0 May30 ? 00:10:33 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testdellnode1: root 6694 2583 0 11:19 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testgateway: root 2023 5828 0 11:19 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testgateway: root 5828 1 0 Sep18 ? 00:00:19 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd1: root 19356 4628 0 11:19 tty1 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd1: root 4628 1 0 Sep19 tty1 00:00:04 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd2: root 22149 2983 0 11:16 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd2: root 2983 1 0 Sep18 ? 00:00:27 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd3: root 15685 6557 0 11:19 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd3: root 6557 1 0 Sep19 ? 00:00:04 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testsched: root 29424 6512 0 11:19 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testsched: root 6512 1 0 Sep18 ? 00:00:20 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 /var/mmfs/gen root at testnsd2# mmstartup -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous error messages to determine cause. /var/mmfs/gen root at testnsd2# mmdsh -F /tmp/cluster.hostnames "ls -l /var/mmfs/ccr" | sort testdellnode1: drwxr-xr-x 2 root root 4096 Mar 3 2017 cached testdellnode1: drwxr-xr-x 2 root root 4096 Nov 10 2016 committed testdellnode1: -rw-r--r-- 1 root root 99 Nov 10 2016 ccr.nodes testdellnode1: total 12 testgateway: drwxr-xr-x. 2 root root 4096 Jun 29 2016 committed testgateway: drwxr-xr-x. 2 root root 4096 Mar 3 2017 cached testgateway: -rw-r--r--. 1 root root 99 Jun 29 2016 ccr.nodes testgateway: total 12 testnsd1: drwxr-xr-x 2 root root 6 Sep 19 15:38 cached testnsd1: drwxr-xr-x 2 root root 6 Sep 19 15:38 committed testnsd1: -rw-r--r-- 1 root root 0 Sep 19 15:39 ccr.disks testnsd1: -rw-r--r-- 1 root root 4 Sep 19 15:38 ccr.noauth testnsd1: -rw-r--r-- 1 root root 99 Sep 19 15:39 ccr.nodes testnsd1: total 8 testnsd2: drwxr-xr-x 2 root root 22 Mar 3 2017 cached testnsd2: drwxr-xr-x 2 root root 4096 Sep 18 11:49 committed testnsd2: -rw------- 1 root root 4096 Sep 18 11:50 ccr.paxos.1 testnsd2: -rw------- 1 root root 4096 Sep 18 11:50 ccr.paxos.2 testnsd2: -rw-r--r-- 1 root root 0 Jun 29 2016 ccr.disks testnsd2: -rw-r--r-- 1 root root 99 Jun 29 2016 ccr.nodes testnsd2: total 16 testnsd3: drwxr-xr-x 2 root root 6 Sep 19 15:41 cached testnsd3: drwxr-xr-x 2 root root 6 Sep 19 15:41 committed testnsd3: -rw-r--r-- 1 root root 0 Jun 29 2016 ccr.disks testnsd3: -rw-r--r-- 1 root root 4 Sep 19 15:41 ccr.noauth testnsd3: -rw-r--r-- 1 root root 99 Jun 29 2016 ccr.nodes testnsd3: total 8 testsched: drwxr-xr-x. 2 root root 4096 Jun 29 2016 committed testsched: drwxr-xr-x. 2 root root 4096 Mar 3 2017 cached testsched: -rw-r--r--. 1 root root 99 Jun 29 2016 ccr.nodes testsched: total 12 /var/mmfs/gen root at testnsd2# more ../ccr/ccr.nodes 3,0,10.0.6.215,,testnsd3.vampire 1,0,10.0.6.213,,testnsd1.vampire 2,0,10.0.6.214,,testnsd2.vampire /var/mmfs/gen root at testnsd2# mmdsh -F /tmp/cluster.hostnames "ls -l /var/mmfs/gen/mmsdrfs" testnsd1: -rw-r--r-- 1 root root 20360 Sep 19 15:21 /var/mmfs/gen/mmsdrfs testnsd3: -rw-r--r-- 1 root root 20360 Sep 19 15:34 /var/mmfs/gen/mmsdrfs testnsd2: -rw-r--r-- 1 root root 20360 Aug 25 17:34 /var/mmfs/gen/mmsdrfs testdellnode1: -rw-r--r-- 1 root root 20360 Aug 25 17:43 /var/mmfs/gen/mmsdrfs testgateway: -rw-r--r--. 1 root root 20360 Aug 25 17:43 /var/mmfs/gen/mmsdrfs testsched: -rw-r--r--. 1 root root 20360 Aug 25 17:43 /var/mmfs/gen/mmsdrfs /var/mmfs/gen root at testnsd2# mmdsh -F /tmp/cluster.hostnames "md5sum /var/mmfs/gen/mmsdrfs" testnsd1: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs testnsd3: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs testnsd2: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs testdellnode1: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs testgateway: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs testsched: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs /var/mmfs/gen root at testnsd2# mmdsh -F /tmp/cluster.hostnames "md5sum /var/mmfs/ssl/stage/genkeyData1" testnsd3: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 testnsd1: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 testnsd2: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 testdellnode1: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 testgateway: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 testsched: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 /var/mmfs/gen root at testnsd2# On Sep 20, 2017, at 10:48 AM, Edward Wahl > wrote: I've run into this before. We didn't use to use CCR. And restoring nodes for us is a major pain in the rear as we only allow one-way root SSH, so we have a number of useful little scripts to work around problems like this. Assuming that you have all the necessary files copied to the correct places, you can manually kick off CCR. I think my script does something like: (copy the encryption key info) scp /var/mmfs/ccr/ccr.nodes :/var/mmfs/ccr/ scp /var/mmfs/gen/mmsdrfs :/var/mmfs/gen/ scp /var/mmfs/ssl/stage/genkeyData1 :/var/mmfs/ssl/stage/ :/usr/lpp/mmfs/bin/mmcommon startCcrMonitor you should then see like 2 copies of it running under mmksh. Ed On Wed, 20 Sep 2017 13:55:28 +0000 "Buterbaugh, Kevin L" > wrote: Hi All, testnsd1 and testnsd3 both had hardware issues (power supply and internal HD respectively). Given that they were 12 year old boxes, we decided to replace them with other boxes that are a mere 7 years old ? keep in mind that this is a test cluster. Disabling CCR does not work, even with the undocumented ??force? option: /var/mmfs/gen root at testnsd2# mmchcluster --ccr-disable -p testnsd2 -s testnsd1 --force mmchcluster: Unable to obtain the GPFS configuration file lock. mmchcluster: GPFS was unable to obtain a lock from node testnsd1.vampire. mmchcluster: Processing continues without lock protection. The authenticity of host 'testnsd3.vampire (10.0.6.215)' can't be established. ECDSA key fingerprint is SHA256:Ky1pkjsC/kvt4RA8PJuEh/W3vcxCJZplr2m1XHr+UwI. ECDSA key fingerprint is MD5:55:59:a0:2a:6e:a1:00:58:85:3d:ac:86:0e:cd:2a:8a. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'testnsd1.vampire (10.0.6.213)' can't be established. ECDSA key fingerprint is SHA256:WPiTtyuyzhuv+lRRpgDjLuHpyHyk/W3+c5N9SabWvnE. ECDSA key fingerprint is MD5:26:26:2a:bf:e4:cb:1d:a8:27:35:96:ef:b5:96:e0:29. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp609.vampire (10.0.21.9)' can't be established. ECDSA key fingerprint is SHA256:/gX6eSp/shsRboVFcUFcNCtGSfbBIWQZ/CWjA6gb17Q. ECDSA key fingerprint is MD5:ca:4d:58:8c:91:28:25:7b:5b:b1:0d:a3:72:a3:00:bb. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp608.vampire (10.0.21.8)' can't be established. ECDSA key fingerprint is SHA256:tvtNWN9b7/Qknb/Am8x7FzyMngi6R3f5SHBqATNtLzw. ECDSA key fingerprint is MD5:fc:4e:87:fb:09:82:cd:67:b0:7d:7f:c7:4b:83:b9:6c. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp612.vampire (10.0.21.12)' can't be established. ECDSA key fingerprint is SHA256:zKXqPt8rIMZWSAYavKEuaAVIm31OGVovoWVU+dBTRPM. ECDSA key fingerprint is MD5:72:4d:fb:22:4e:b3:0e:04:37:be:16:74:ae:ea:05:6c. Are you sure you want to continue connecting (yes/no)? root at vmp610.vampire's password: testnsd3.vampire: Host key verification failed. mmdsh: testnsd3.vampire remote shell process had return code 255. testnsd1.vampire: Host key verification failed. mmdsh: testnsd1.vampire remote shell process had return code 255. vmp609.vampire: Host key verification failed. mmdsh: vmp609.vampire remote shell process had return code 255. vmp608.vampire: Host key verification failed. mmdsh: vmp608.vampire remote shell process had return code 255. vmp612.vampire: Host key verification failed. mmdsh: vmp612.vampire remote shell process had return code 255. root at vmp610.vampire's password: vmp610.vampire: Permission denied, please try again. root at vmp610.vampire's password: vmp610.vampire: Permission denied, please try again. vmp610.vampire: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password). mmdsh: vmp610.vampire remote shell process had return code 255. Verifying GPFS is stopped on all nodes ... The authenticity of host 'testnsd3.vampire (10.0.6.215)' can't be established. ECDSA key fingerprint is SHA256:Ky1pkjsC/kvt4RA8PJuEh/W3vcxCJZplr2m1XHr+UwI. ECDSA key fingerprint is MD5:55:59:a0:2a:6e:a1:00:58:85:3d:ac:86:0e:cd:2a:8a. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp612.vampire (10.0.21.12)' can't be established. ECDSA key fingerprint is SHA256:zKXqPt8rIMZWSAYavKEuaAVIm31OGVovoWVU+dBTRPM. ECDSA key fingerprint is MD5:72:4d:fb:22:4e:b3:0e:04:37:be:16:74:ae:ea:05:6c. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp608.vampire (10.0.21.8)' can't be established. ECDSA key fingerprint is SHA256:tvtNWN9b7/Qknb/Am8x7FzyMngi6R3f5SHBqATNtLzw. ECDSA key fingerprint is MD5:fc:4e:87:fb:09:82:cd:67:b0:7d:7f:c7:4b:83:b9:6c. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp609.vampire (10.0.21.9)' can't be established. ECDSA key fingerprint is SHA256:/gX6eSp/shsRboVFcUFcNCtGSfbBIWQZ/CWjA6gb17Q. ECDSA key fingerprint is MD5:ca:4d:58:8c:91:28:25:7b:5b:b1:0d:a3:72:a3:00:bb. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'testnsd1.vampire (10.0.6.213)' can't be established. ECDSA key fingerprint is SHA256:WPiTtyuyzhuv+lRRpgDjLuHpyHyk/W3+c5N9SabWvnE. ECDSA key fingerprint is MD5:26:26:2a:bf:e4:cb:1d:a8:27:35:96:ef:b5:96:e0:29. Are you sure you want to continue connecting (yes/no)? root at vmp610.vampire's password: root at vmp610.vampire's password: root at vmp610.vampire's password: testnsd3.vampire: Host key verification failed. mmdsh: testnsd3.vampire remote shell process had return code 255. vmp612.vampire: Host key verification failed. mmdsh: vmp612.vampire remote shell process had return code 255. vmp608.vampire: Host key verification failed. mmdsh: vmp608.vampire remote shell process had return code 255. vmp609.vampire: Host key verification failed. mmdsh: vmp609.vampire remote shell process had return code 255. testnsd1.vampire: Host key verification failed. mmdsh: testnsd1.vampire remote shell process had return code 255. vmp610.vampire: Permission denied, please try again. vmp610.vampire: Permission denied, please try again. vmp610.vampire: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password). mmdsh: vmp610.vampire remote shell process had return code 255. mmchcluster: Command failed. Examine previous error messages to determine cause. /var/mmfs/gen root at testnsd2# I believe that part of the problem may be that there are 4 client nodes that were removed from the cluster without removing them from the cluster (done by another SysAdmin who was in a hurry to repurpose those machines). They?re up and pingable but not reachable by GPFS anymore, which I?m pretty sure is making things worse. Nor does Loic?s suggestion of running mmcommon work (but thanks for the suggestion!) ? actually the mmcommon part worked, but a subsequent attempt to start the cluster up failed: /var/mmfs/gen root at testnsd2# mmstartup -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous error messages to determine cause. /var/mmfs/gen root at testnsd2# Thanks. Kevin On Sep 19, 2017, at 10:07 PM, IBM Spectrum Scale > wrote: Hi Kevin, Let's me try to understand the problem you have. What's the meaning of node died here. Are you mean that there are some hardware/OS issue which cannot be fixed and OS cannot be up anymore? I agree with Bob that you can have a try to disable CCR temporally, restore cluster configuration and enable it again. Such as: 1. Login to a node which has proper GPFS config, e.g NodeA 2. Shutdown daemon in all client cluster. 3. mmchcluster --ccr-disable -p NodeA 4. mmsdrrestore -a -p NodeA 5. mmauth genkey propagate -N testnsd1, testnsd3 6. mmchcluster --ccr-enable Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fdeveloperworks%2Fcommunity%2Fforums%2Fhtml%2Fforum%3Fid%3D11111111-0000-0000-0000-000000000479&data=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C745cfeaac7264124bb8c08d5003f162a%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636415193316350738&sdata=8OL9COHsb4M%2BZOyWta92acdO8K1Ez8HJfHbrCdDsmRs%3D&reserved=0. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. "Oesterlin, Robert" ---09/20/2017 07:39:55 AM---OK ? I?ve run across this before, and it?s because of a bug (as I recall) having to do with CCR and From: "Oesterlin, Robert" > To: gpfsug main discussion list > Date: 09/20/2017 07:39 AM Subject: Re: [gpfsug-discuss] CCR cluster down for the count? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ OK ? I?ve run across this before, and it?s because of a bug (as I recall) having to do with CCR and quorum. What I think you can do is set the cluster to non-ccr (mmchcluster ?ccr-disable) with all the nodes down, bring it back up and then re-enable ccr. I?ll see if I can find this in one of the recent 4.2 release nodes. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: > on behalf of "Buterbaugh, Kevin L" > Reply-To: gpfsug main discussion list > Date: Tuesday, September 19, 2017 at 4:03 PM To: gpfsug main discussion list > Subject: [EXTERNAL] [gpfsug-discuss] CCR cluster down for the count? Hi All, We have a small test cluster that is CCR enabled. It only had/has 3 NSD servers (testnsd1, 2, and 3) and maybe 3-6 clients. testnsd3 died a while back. I did nothing about it at the time because it was due to be life-cycled as soon as I finished a couple of higher priority projects. Yesterday, testnsd1 also died, which took the whole cluster down. So now resolving this has become higher priority? ;-) I took two other boxes and set them up as testnsd1 and 3, respectively. I?ve done a ?mmsdrrestore -p testnsd2 -R /usr/bin/scp? on both of them. I?ve also done a "mmccr setup -F? and copied the ccr.disks and ccr.nodes files from testnsd2 to them. And I?ve copied /var/mmfs/gen/mmsdrfs from testnsd2 to testnsd1 and 3. In case it?s not obvious from the above, networking is fine ? ssh without a password between those 3 boxes is fine. However, when I try to startup GPFS ? or run any GPFS command I get: /root root at testnsd2# mmstartup -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous error messages to determine cause. /root root at testnsd2# I?ve got to run to a meeting right now, so I hope I?m not leaving out any crucial details here ? does anyone have an idea what I need to do? Thanks? ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttp-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss%26d%3DDwICAg%26c%3Djf_iaSHvJObTbx-siA1ZOg%26r%3DIbxtjdkPAM2Sbon4Lbbi4w%26m%3DmBSa534LB4C2zN59ZsJSlginQqfcrutinpAPYNDqU_Y%26s%3DYJEapknqzE2d9kwZzZuu6gEW0DzBoM-o94pXGEeCfuI%26e&data=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C745cfeaac7264124bb8c08d5003f162a%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636415193316350738&sdata=oQ4u%2BdyyYLY7HzaOqRPEGjUVhi7AQF%2BvbvnWA4bhuXE%3D&reserved=0= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C494f0469ec084568b39608d4ffd4b8c2%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636414736486816768&sdata=kBvEL7Kp2JMGuLIL4NX3UV7h3emaayQSbHr8O1F2CXc%3D&reserved=0 -- Ed Wahl Ohio Supercomputer Center 614-292-9302 ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From stijn.deweirdt at ugent.be Wed Sep 20 18:48:26 2017 From: stijn.deweirdt at ugent.be (Stijn De Weirdt) Date: Wed, 20 Sep 2017 19:48:26 +0200 Subject: [gpfsug-discuss] CCR cluster down for the count? In-Reply-To: <28D10363-A8F3-439B-81DB-EB0E4E750FFD@vanderbilt.edu> References: <20170920114844.6bf9f27b@osc.edu> <28D10363-A8F3-439B-81DB-EB0E4E750FFD@vanderbilt.edu> Message-ID: <1f0b2657-8ca3-7b35-95f3-7c4edb6c0818@ugent.be> hi kevin, we were hit by similar issue when we did something not so smart: we had a 5 node quorum, and we wanted to replace 1 test node with 3 more production quorum node. we however first removed the test node, and then with 4 quorum nodes we did mmshutdown for some other config modifications. when we tried to start it, we hit the same "Not enough CCR quorum nodes available" errors. also, none of the ccr commands were helpful; they also hanged, even simple ones like show etc etc. what we did in the end was the following (and some try-and-error): from the /var/adm/ras/mmsdrserv.log logfiles we guessed that we had some sort of split brain paxos cluster (some reported " ccrd: recovery complete (rc 809)", some same message with 'rc 0' and some didn't have the recovery complete on the last line(s)) * stop ccr everywhere mmshutdown -a mmdsh -N all pkill -9 -f mmccr * one by one, start the paxos cluster using mmshutdown on the quorum nodes (mmshutdown will start ccr and there is no unit or something to help with that). * the nodes will join after 3-4 minutes and report "recovery complete"; wait for it before you start another one * the trial-and-error part was that sometimes there was recovery complete with rc=809, sometimes with rc=0. in the end, once they all had same rc=0, paxos was happy again and eg mmlsconfig worked again. this left a very bad experience with CCR with us, but we want to use ces, so no real alternative (and to be honest, with odd number of quorum, we saw no more issues, everyting was smooth). in particular we were missing * unit files for all extra services that gpfs launched (mmccrmoniotr, mmsysmon); so we can monitor and start/stop them cleanly * ccr commands that work with broken paxos setup; eg to report that the paxos cluster is broken or operating in some split-brain mode. anyway, YMMV and good luck. stijn On 09/20/2017 06:27 PM, Buterbaugh, Kevin L wrote: > Hi Ed, > > Thanks for the suggestion ? that?s basically what I had done yesterday after Googling and getting a hit or two on the IBM DeveloperWorks site. I?m including some output below which seems to show that I?ve got everything set up but it?s still not working. > > Am I missing something? We don?t use CCR on our production cluster (and this experience doesn?t make me eager to do so!), so I?m not that familiar with it... > > Kevin > > /var/mmfs/gen > root at testnsd2# mmdsh -F /tmp/cluster.hostnames "ps -ef | grep mmccr | grep -v grep" | sort > testdellnode1: root 2583 1 0 May30 ? 00:10:33 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testdellnode1: root 6694 2583 0 11:19 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testgateway: root 2023 5828 0 11:19 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testgateway: root 5828 1 0 Sep18 ? 00:00:19 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testnsd1: root 19356 4628 0 11:19 tty1 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testnsd1: root 4628 1 0 Sep19 tty1 00:00:04 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testnsd2: root 22149 2983 0 11:16 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testnsd2: root 2983 1 0 Sep18 ? 00:00:27 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testnsd3: root 15685 6557 0 11:19 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testnsd3: root 6557 1 0 Sep19 ? 00:00:04 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testsched: root 29424 6512 0 11:19 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testsched: root 6512 1 0 Sep18 ? 00:00:20 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > /var/mmfs/gen > root at testnsd2# mmstartup -a > get file failed: Not enough CCR quorum nodes available (err 809) > gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 > mmstartup: Command failed. Examine previous error messages to determine cause. > /var/mmfs/gen > root at testnsd2# mmdsh -F /tmp/cluster.hostnames "ls -l /var/mmfs/ccr" | sort > testdellnode1: drwxr-xr-x 2 root root 4096 Mar 3 2017 cached > testdellnode1: drwxr-xr-x 2 root root 4096 Nov 10 2016 committed > testdellnode1: -rw-r--r-- 1 root root 99 Nov 10 2016 ccr.nodes > testdellnode1: total 12 > testgateway: drwxr-xr-x. 2 root root 4096 Jun 29 2016 committed > testgateway: drwxr-xr-x. 2 root root 4096 Mar 3 2017 cached > testgateway: -rw-r--r--. 1 root root 99 Jun 29 2016 ccr.nodes > testgateway: total 12 > testnsd1: drwxr-xr-x 2 root root 6 Sep 19 15:38 cached > testnsd1: drwxr-xr-x 2 root root 6 Sep 19 15:38 committed > testnsd1: -rw-r--r-- 1 root root 0 Sep 19 15:39 ccr.disks > testnsd1: -rw-r--r-- 1 root root 4 Sep 19 15:38 ccr.noauth > testnsd1: -rw-r--r-- 1 root root 99 Sep 19 15:39 ccr.nodes > testnsd1: total 8 > testnsd2: drwxr-xr-x 2 root root 22 Mar 3 2017 cached > testnsd2: drwxr-xr-x 2 root root 4096 Sep 18 11:49 committed > testnsd2: -rw------- 1 root root 4096 Sep 18 11:50 ccr.paxos.1 > testnsd2: -rw------- 1 root root 4096 Sep 18 11:50 ccr.paxos.2 > testnsd2: -rw-r--r-- 1 root root 0 Jun 29 2016 ccr.disks > testnsd2: -rw-r--r-- 1 root root 99 Jun 29 2016 ccr.nodes > testnsd2: total 16 > testnsd3: drwxr-xr-x 2 root root 6 Sep 19 15:41 cached > testnsd3: drwxr-xr-x 2 root root 6 Sep 19 15:41 committed > testnsd3: -rw-r--r-- 1 root root 0 Jun 29 2016 ccr.disks > testnsd3: -rw-r--r-- 1 root root 4 Sep 19 15:41 ccr.noauth > testnsd3: -rw-r--r-- 1 root root 99 Jun 29 2016 ccr.nodes > testnsd3: total 8 > testsched: drwxr-xr-x. 2 root root 4096 Jun 29 2016 committed > testsched: drwxr-xr-x. 2 root root 4096 Mar 3 2017 cached > testsched: -rw-r--r--. 1 root root 99 Jun 29 2016 ccr.nodes > testsched: total 12 > /var/mmfs/gen > root at testnsd2# more ../ccr/ccr.nodes > 3,0,10.0.6.215,,testnsd3.vampire > 1,0,10.0.6.213,,testnsd1.vampire > 2,0,10.0.6.214,,testnsd2.vampire > /var/mmfs/gen > root at testnsd2# mmdsh -F /tmp/cluster.hostnames "ls -l /var/mmfs/gen/mmsdrfs" > testnsd1: -rw-r--r-- 1 root root 20360 Sep 19 15:21 /var/mmfs/gen/mmsdrfs > testnsd3: -rw-r--r-- 1 root root 20360 Sep 19 15:34 /var/mmfs/gen/mmsdrfs > testnsd2: -rw-r--r-- 1 root root 20360 Aug 25 17:34 /var/mmfs/gen/mmsdrfs > testdellnode1: -rw-r--r-- 1 root root 20360 Aug 25 17:43 /var/mmfs/gen/mmsdrfs > testgateway: -rw-r--r--. 1 root root 20360 Aug 25 17:43 /var/mmfs/gen/mmsdrfs > testsched: -rw-r--r--. 1 root root 20360 Aug 25 17:43 /var/mmfs/gen/mmsdrfs > /var/mmfs/gen > root at testnsd2# mmdsh -F /tmp/cluster.hostnames "md5sum /var/mmfs/gen/mmsdrfs" > testnsd1: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs > testnsd3: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs > testnsd2: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs > testdellnode1: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs > testgateway: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs > testsched: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs > /var/mmfs/gen > root at testnsd2# mmdsh -F /tmp/cluster.hostnames "md5sum /var/mmfs/ssl/stage/genkeyData1" > testnsd3: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 > testnsd1: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 > testnsd2: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 > testdellnode1: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 > testgateway: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 > testsched: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 > /var/mmfs/gen > root at testnsd2# > > On Sep 20, 2017, at 10:48 AM, Edward Wahl > wrote: > > I've run into this before. We didn't use to use CCR. And restoring nodes for > us is a major pain in the rear as we only allow one-way root SSH, so we have a > number of useful little scripts to work around problems like this. > > Assuming that you have all the necessary files copied to the correct > places, you can manually kick off CCR. > > I think my script does something like: > > (copy the encryption key info) > > scp /var/mmfs/ccr/ccr.nodes :/var/mmfs/ccr/ > > scp /var/mmfs/gen/mmsdrfs :/var/mmfs/gen/ > > scp /var/mmfs/ssl/stage/genkeyData1 :/var/mmfs/ssl/stage/ > > :/usr/lpp/mmfs/bin/mmcommon startCcrMonitor > > you should then see like 2 copies of it running under mmksh. > > Ed > > > On Wed, 20 Sep 2017 13:55:28 +0000 > "Buterbaugh, Kevin L" > wrote: > > Hi All, > > testnsd1 and testnsd3 both had hardware issues (power supply and internal HD > respectively). Given that they were 12 year old boxes, we decided to replace > them with other boxes that are a mere 7 years old ? keep in mind that this is > a test cluster. > > Disabling CCR does not work, even with the undocumented ??force? option: > > /var/mmfs/gen > root at testnsd2# mmchcluster --ccr-disable -p testnsd2 -s testnsd1 --force > mmchcluster: Unable to obtain the GPFS configuration file lock. > mmchcluster: GPFS was unable to obtain a lock from node testnsd1.vampire. > mmchcluster: Processing continues without lock protection. > The authenticity of host 'testnsd3.vampire (10.0.6.215)' can't be established. > ECDSA key fingerprint is SHA256:Ky1pkjsC/kvt4RA8PJuEh/W3vcxCJZplr2m1XHr+UwI. > ECDSA key fingerprint is MD5:55:59:a0:2a:6e:a1:00:58:85:3d:ac:86:0e:cd:2a:8a. > Are you sure you want to continue connecting (yes/no)? The authenticity of > host 'testnsd1.vampire (10.0.6.213)' can't be established. ECDSA key > fingerprint is SHA256:WPiTtyuyzhuv+lRRpgDjLuHpyHyk/W3+c5N9SabWvnE. ECDSA key > fingerprint is MD5:26:26:2a:bf:e4:cb:1d:a8:27:35:96:ef:b5:96:e0:29. Are you > sure you want to continue connecting (yes/no)? The authenticity of host > 'vmp609.vampire (10.0.21.9)' can't be established. ECDSA key fingerprint is > SHA256:/gX6eSp/shsRboVFcUFcNCtGSfbBIWQZ/CWjA6gb17Q. ECDSA key fingerprint is > MD5:ca:4d:58:8c:91:28:25:7b:5b:b1:0d:a3:72:a3:00:bb. Are you sure you want to > continue connecting (yes/no)? The authenticity of host 'vmp608.vampire > (10.0.21.8)' can't be established. ECDSA key fingerprint is > SHA256:tvtNWN9b7/Qknb/Am8x7FzyMngi6R3f5SHBqATNtLzw. ECDSA key fingerprint is > MD5:fc:4e:87:fb:09:82:cd:67:b0:7d:7f:c7:4b:83:b9:6c. Are you sure you want to > continue connecting (yes/no)? The authenticity of host 'vmp612.vampire > (10.0.21.12)' can't be established. ECDSA key fingerprint is > SHA256:zKXqPt8rIMZWSAYavKEuaAVIm31OGVovoWVU+dBTRPM. ECDSA key fingerprint is > MD5:72:4d:fb:22:4e:b3:0e:04:37:be:16:74:ae:ea:05:6c. Are you sure you want to > continue connecting (yes/no)? > root at vmp610.vampire's password: > testnsd3.vampire: Host key verification failed. mmdsh: testnsd3.vampire > remote shell process had return code 255. testnsd1.vampire: Host key > verification failed. mmdsh: testnsd1.vampire remote shell process had return > code 255. vmp609.vampire: Host key verification failed. mmdsh: > vmp609.vampire remote shell process had return code 255. vmp608.vampire: > Host key verification failed. mmdsh: vmp608.vampire remote shell process had > return code 255. vmp612.vampire: Host key verification failed. mmdsh: > vmp612.vampire remote shell process had return code 255. > > root at vmp610.vampire's password: vmp610.vampire: > Permission denied, please try again. > > root at vmp610.vampire's password: vmp610.vampire: > Permission denied, please try again. > > vmp610.vampire: Permission denied > (publickey,gssapi-keyex,gssapi-with-mic,password). mmdsh: vmp610.vampire > remote shell process had return code 255. > > Verifying GPFS is stopped on all nodes ... > The authenticity of host 'testnsd3.vampire (10.0.6.215)' can't be established. > ECDSA key fingerprint is SHA256:Ky1pkjsC/kvt4RA8PJuEh/W3vcxCJZplr2m1XHr+UwI. > ECDSA key fingerprint is MD5:55:59:a0:2a:6e:a1:00:58:85:3d:ac:86:0e:cd:2a:8a. > Are you sure you want to continue connecting (yes/no)? The authenticity of > host 'vmp612.vampire (10.0.21.12)' can't be established. ECDSA key > fingerprint is SHA256:zKXqPt8rIMZWSAYavKEuaAVIm31OGVovoWVU+dBTRPM. ECDSA key > fingerprint is MD5:72:4d:fb:22:4e:b3:0e:04:37:be:16:74:ae:ea:05:6c. Are you > sure you want to continue connecting (yes/no)? The authenticity of host > 'vmp608.vampire (10.0.21.8)' can't be established. ECDSA key fingerprint is > SHA256:tvtNWN9b7/Qknb/Am8x7FzyMngi6R3f5SHBqATNtLzw. ECDSA key fingerprint is > MD5:fc:4e:87:fb:09:82:cd:67:b0:7d:7f:c7:4b:83:b9:6c. Are you sure you want to > continue connecting (yes/no)? The authenticity of host 'vmp609.vampire > (10.0.21.9)' can't be established. ECDSA key fingerprint is > SHA256:/gX6eSp/shsRboVFcUFcNCtGSfbBIWQZ/CWjA6gb17Q. ECDSA key fingerprint is > MD5:ca:4d:58:8c:91:28:25:7b:5b:b1:0d:a3:72:a3:00:bb. Are you sure you want to > continue connecting (yes/no)? The authenticity of host 'testnsd1.vampire > (10.0.6.213)' can't be established. ECDSA key fingerprint is > SHA256:WPiTtyuyzhuv+lRRpgDjLuHpyHyk/W3+c5N9SabWvnE. ECDSA key fingerprint is > MD5:26:26:2a:bf:e4:cb:1d:a8:27:35:96:ef:b5:96:e0:29. Are you sure you want to > continue connecting (yes/no)? > root at vmp610.vampire's password: > root at vmp610.vampire's password: > root at vmp610.vampire's password: > > testnsd3.vampire: Host key verification failed. > mmdsh: testnsd3.vampire remote shell process had return code 255. > vmp612.vampire: Host key verification failed. > mmdsh: vmp612.vampire remote shell process had return code 255. > vmp608.vampire: Host key verification failed. > mmdsh: vmp608.vampire remote shell process had return code 255. > vmp609.vampire: Host key verification failed. > mmdsh: vmp609.vampire remote shell process had return code 255. > testnsd1.vampire: Host key verification failed. > mmdsh: testnsd1.vampire remote shell process had return code 255. > vmp610.vampire: Permission denied, please try again. > vmp610.vampire: Permission denied, please try again. > vmp610.vampire: Permission denied > (publickey,gssapi-keyex,gssapi-with-mic,password). mmdsh: vmp610.vampire > remote shell process had return code 255. mmchcluster: Command failed. > Examine previous error messages to determine cause. /var/mmfs/gen > root at testnsd2# > > I believe that part of the problem may be that there are 4 client nodes that > were removed from the cluster without removing them from the cluster (done by > another SysAdmin who was in a hurry to repurpose those machines). They?re up > and pingable but not reachable by GPFS anymore, which I?m pretty sure is > making things worse. > > Nor does Loic?s suggestion of running mmcommon work (but thanks for the > suggestion!) ? actually the mmcommon part worked, but a subsequent attempt to > start the cluster up failed: > > /var/mmfs/gen > root at testnsd2# mmstartup -a > get file failed: Not enough CCR quorum nodes available (err 809) > gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 > mmstartup: Command failed. Examine previous error messages to determine cause. > /var/mmfs/gen > root at testnsd2# > > Thanks. > > Kevin > > On Sep 19, 2017, at 10:07 PM, IBM Spectrum Scale > > wrote: > > > Hi Kevin, > > Let's me try to understand the problem you have. What's the meaning of node > died here. Are you mean that there are some hardware/OS issue which cannot be > fixed and OS cannot be up anymore? > > I agree with Bob that you can have a try to disable CCR temporally, restore > cluster configuration and enable it again. > > Such as: > > 1. Login to a node which has proper GPFS config, e.g NodeA > 2. Shutdown daemon in all client cluster. > 3. mmchcluster --ccr-disable -p NodeA > 4. mmsdrrestore -a -p NodeA > 5. mmauth genkey propagate -N testnsd1, testnsd3 > 6. mmchcluster --ccr-enable > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fdeveloperworks%2Fcommunity%2Fforums%2Fhtml%2Fforum%3Fid%3D11111111-0000-0000-0000-000000000479&data=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C745cfeaac7264124bb8c08d5003f162a%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636415193316350738&sdata=8OL9COHsb4M%2BZOyWta92acdO8K1Ez8HJfHbrCdDsmRs%3D&reserved=0. > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 in the United States or your local IBM Service Center in other > countries. > > The forum is informally monitored as time permits and should not be used for > priority messages to the Spectrum Scale (GPFS) team. > > "Oesterlin, Robert" ---09/20/2017 07:39:55 AM---OK ? I?ve run > across this before, and it?s because of a bug (as I recall) having to do with > CCR and > > From: "Oesterlin, Robert" > > To: gpfsug > main discussion list > > > Date: 09/20/2017 07:39 AM Subject: Re: [gpfsug-discuss] CCR cluster down for > the count? Sent by: > gpfsug-discuss-bounces at spectrumscale.org > > ________________________________ > > > > OK ? I?ve run across this before, and it?s because of a bug (as I recall) > having to do with CCR and quorum. What I think you can do is set the cluster > to non-ccr (mmchcluster ?ccr-disable) with all the nodes down, bring it back > up and then re-enable ccr. > > I?ll see if I can find this in one of the recent 4.2 release nodes. > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > > From: > > > on behalf of "Buterbaugh, Kevin L" > > > Reply-To: gpfsug main discussion list > > > Date: Tuesday, September 19, 2017 at 4:03 PM To: gpfsug main discussion list > > > Subject: [EXTERNAL] [gpfsug-discuss] CCR cluster down for the count? > > Hi All, > > We have a small test cluster that is CCR enabled. It only had/has 3 NSD > servers (testnsd1, 2, and 3) and maybe 3-6 clients. testnsd3 died a while > back. I did nothing about it at the time because it was due to be life-cycled > as soon as I finished a couple of higher priority projects. > > Yesterday, testnsd1 also died, which took the whole cluster down. So now > resolving this has become higher priority? ;-) > > I took two other boxes and set them up as testnsd1 and 3, respectively. I?ve > done a ?mmsdrrestore -p testnsd2 -R /usr/bin/scp? on both of them. I?ve also > done a "mmccr setup -F? and copied the ccr.disks and ccr.nodes files from > testnsd2 to them. And I?ve copied /var/mmfs/gen/mmsdrfs from testnsd2 to > testnsd1 and 3. In case it?s not obvious from the above, networking is fine ? > ssh without a password between those 3 boxes is fine. > > However, when I try to startup GPFS ? or run any GPFS command I get: > > /root > root at testnsd2# mmstartup -a > get file failed: Not enough CCR quorum nodes available (err 809) > gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 > mmstartup: Command failed. Examine previous error messages to determine cause. > /root > root at testnsd2# > > I?ve got to run to a meeting right now, so I hope I?m not leaving out any > crucial details here ? does anyone have an idea what I need to do? Thanks? > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu - > (615)875-9633 > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttp-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss%26d%3DDwICAg%26c%3Djf_iaSHvJObTbx-siA1ZOg%26r%3DIbxtjdkPAM2Sbon4Lbbi4w%26m%3DmBSa534LB4C2zN59ZsJSlginQqfcrutinpAPYNDqU_Y%26s%3DYJEapknqzE2d9kwZzZuu6gEW0DzBoM-o94pXGEeCfuI%26e&data=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C745cfeaac7264124bb8c08d5003f162a%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636415193316350738&sdata=oQ4u%2BdyyYLY7HzaOqRPEGjUVhi7AQF%2BvbvnWA4bhuXE%3D&reserved=0= > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C494f0469ec084568b39608d4ffd4b8c2%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636414736486816768&sdata=kBvEL7Kp2JMGuLIL4NX3UV7h3emaayQSbHr8O1F2CXc%3D&reserved=0 > > > > > -- > > Ed Wahl > Ohio Supercomputer Center > 614-292-9302 > > > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > From jonathon.anderson at colorado.edu Wed Sep 20 19:55:04 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Wed, 20 Sep 2017 18:55:04 +0000 Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication In-Reply-To: References: <27469.1500914134@turing-police.cc.vt.edu> , , Message-ID: I shouldn't need SMB for authentication if I'm only using userdefined authentication, though. ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Sobey, Richard A Sent: Wednesday, September 20, 2017 2:23:37 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication This sounded familiar to a problem I had to do with SMB and NFS. I've looked, and it's a different problem, but at the time I had this response. "That would be the case when Active Directory is configured for authentication. In that case the SMB service includes two aspects: One is the actual SMB file server, and the second one is the service for the Active Directory integration. Since NFS depends on authentication and id mapping services, it requires SMB to be running." I suspect the last paragraph is relevant in your case. HTH Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jonathon A Anderson Sent: 20 September 2017 06:13 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Returning to this thread cause I'm having the same issue as Ilan, above. I'm working on setting up CES in our environment after finally getting a blocking bugfix applied. I'm making it further now, but I'm getting an error when I try to create my export: --- [root at sgate2 ~]# mmnfs export add /gpfs/summit/scratch --client 'login*.rc.int.colorado.edu(rw,root_squash);dtn*.rc.int.colorado.edu(rw,root_squash)' mmcesfuncs.sh: Current authentication: none is invalid. This operation can not be completed without correct Authentication configuration. Configure authentication using: mmuserauth mmnfs export add: Command failed. Examine previous error messages to determine cause. --- When I try to configure mmuserauth, I get an error about not having SMB active; but I don't want to configure SMB, only NFS. --- [root at sgate2 ~]# /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. --- How can I configure NFS exports with mmnfs without having to enable SMB? ~jonathon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Varun Mittal3 Sent: Tuesday, July 25, 2017 9:44:24 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sorry a small typo: mmuserauth service create --data-access-method file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune [Inactive hide details for Varun Mittal3---26/07/2017 09:12:27 AM---Hi Did you try to run this command from a CES designated nod]Varun Mittal3---26/07/2017 09:12:27 AM---Hi Did you try to run this command from a CES designated node ? From: Varun Mittal3/India/IBM To: gpfsug main discussion list Date: 26/07/2017 09:12 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication ________________________________ Hi Did you try to run this command from a CES designated node ? If no, then try executing the command from a CES node: mmuserauth service create --data-access-type file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune [Inactive hide details for Ilan Schwarts ---25/07/2017 10:22:26 AM---Hi, While trying to add the userdefined auth, I receive err]Ilan Schwarts ---25/07/2017 10:22:26 AM---Hi, While trying to add the userdefined auth, I receive error that SMB From: Ilan Schwarts To: gpfsug main discussion list Date: 25/07/2017 10:22 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi, While trying to add the userdefined auth, I receive error that SMB service not enabled. I am currently working on a spectrum scale cluster, and i dont have the SMB package, I am waiting for it.. is there a way to export NFSv3 using the spectrum scale tools without SMB package ? [root at LH20-GPFS1 ~]# mmuserauth service create --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. I exported the NFS via /etc/exports and than ./exportfs -a .. It works fine, I was able to mount the gpfs export from another machine.. this was my work-around since the spectrum scale tools failed to export NFSv3 On Mon, Jul 24, 2017 at 7:35 PM, wrote: > On Mon, 24 Jul 2017 13:36:41 +0300, Ilan Schwarts said: >> Hi, >> I have gpfs with 2 Nodes (redhat). >> I am trying to create NFS share - So I would be able to mount and >> access it from another linux machine. > >> While trying to create NFS (I execute the following): >> [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "* >> Access_Type=RW,Protocols=3:4,Squash=no_root_squash)" > > You can get away with little to no authentication for NFSv3, but > not for NFSv4. Try with Protocols=3 only and > > mmuserauth service create --type userdefined > > that should get you Unix-y NFSv3 UID/GID support and "trust what the NFS > client tells you". This of course only works sanely if each NFS export is > only to a set of machines in the same administrative domain that manages their > UID/GIDs. Exporting to two sets of machines that don't coordinate their > UID/GID space is, of course, where hilarity and hijinks ensue.... > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://secure-web.cisco.com/1w-ldlm8bq5oYiMuHk7N1T32DW18VkxjnfkMWDjdpiBv1WJToz9PCO1zVyGvWIVP3-fNVoZ49ioTlOwQoRbyC_MjpoBPlD3jfpV_knuzViyRNZiyliSGH9rx5nGVvTLSPrjIwzvUIZDadCuNXgM_jYCVBE2RsDpg8o4LCjJv9QIZPbyHlKrkoQ0sNGXOZPYT7gxpo8sVjoxKQbOgQzkDnPMQoa2a8miTP19fLkB5HqV5cJv3U-Vs_qLtyJGVsrSgLu2wQoDMxymVwm5mcRWO6MYfl4_MtVXKzQRwQqemODDjSa5my7zl98vobN_ui-cRwCYeVbOwEd57CjaYRzKcBu6Dbd2TmGar7JUNWVtg1dZPTv6uothD6V4g0Q0MuXZsBICzfxbjXI9WlB3Tiu3ty0oxenYrM8yxE-Cl57VhmV4KlY18EHMFncfLtRkk9cTHtfrEjiXBROhCuvEeqhrYT6A/http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss > -- - Ilan Schwarts _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://secure-web.cisco.com/1w-ldlm8bq5oYiMuHk7N1T32DW18VkxjnfkMWDjdpiBv1WJToz9PCO1zVyGvWIVP3-fNVoZ49ioTlOwQoRbyC_MjpoBPlD3jfpV_knuzViyRNZiyliSGH9rx5nGVvTLSPrjIwzvUIZDadCuNXgM_jYCVBE2RsDpg8o4LCjJv9QIZPbyHlKrkoQ0sNGXOZPYT7gxpo8sVjoxKQbOgQzkDnPMQoa2a8miTP19fLkB5HqV5cJv3U-Vs_qLtyJGVsrSgLu2wQoDMxymVwm5mcRWO6MYfl4_MtVXKzQRwQqemODDjSa5my7zl98vobN_ui-cRwCYeVbOwEd57CjaYRzKcBu6Dbd2TmGar7JUNWVtg1dZPTv6uothD6V4g0Q0MuXZsBICzfxbjXI9WlB3Tiu3ty0oxenYrM8yxE-Cl57VhmV4KlY18EHMFncfLtRkk9cTHtfrEjiXBROhCuvEeqhrYT6A/http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://secure-web.cisco.com/1w-ldlm8bq5oYiMuHk7N1T32DW18VkxjnfkMWDjdpiBv1WJToz9PCO1zVyGvWIVP3-fNVoZ49ioTlOwQoRbyC_MjpoBPlD3jfpV_knuzViyRNZiyliSGH9rx5nGVvTLSPrjIwzvUIZDadCuNXgM_jYCVBE2RsDpg8o4LCjJv9QIZPbyHlKrkoQ0sNGXOZPYT7gxpo8sVjoxKQbOgQzkDnPMQoa2a8miTP19fLkB5HqV5cJv3U-Vs_qLtyJGVsrSgLu2wQoDMxymVwm5mcRWO6MYfl4_MtVXKzQRwQqemODDjSa5my7zl98vobN_ui-cRwCYeVbOwEd57CjaYRzKcBu6Dbd2TmGar7JUNWVtg1dZPTv6uothD6V4g0Q0MuXZsBICzfxbjXI9WlB3Tiu3ty0oxenYrM8yxE-Cl57VhmV4KlY18EHMFncfLtRkk9cTHtfrEjiXBROhCuvEeqhrYT6A/http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://secure-web.cisco.com/1w-ldlm8bq5oYiMuHk7N1T32DW18VkxjnfkMWDjdpiBv1WJToz9PCO1zVyGvWIVP3-fNVoZ49ioTlOwQoRbyC_MjpoBPlD3jfpV_knuzViyRNZiyliSGH9rx5nGVvTLSPrjIwzvUIZDadCuNXgM_jYCVBE2RsDpg8o4LCjJv9QIZPbyHlKrkoQ0sNGXOZPYT7gxpo8sVjoxKQbOgQzkDnPMQoa2a8miTP19fLkB5HqV5cJv3U-Vs_qLtyJGVsrSgLu2wQoDMxymVwm5mcRWO6MYfl4_MtVXKzQRwQqemODDjSa5my7zl98vobN_ui-cRwCYeVbOwEd57CjaYRzKcBu6Dbd2TmGar7JUNWVtg1dZPTv6uothD6V4g0Q0MuXZsBICzfxbjXI9WlB3Tiu3ty0oxenYrM8yxE-Cl57VhmV4KlY18EHMFncfLtRkk9cTHtfrEjiXBROhCuvEeqhrYT6A/http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss From ewahl at osc.edu Wed Sep 20 20:07:39 2017 From: ewahl at osc.edu (Edward Wahl) Date: Wed, 20 Sep 2017 15:07:39 -0400 Subject: [gpfsug-discuss] CCR cluster down for the count? In-Reply-To: <28D10363-A8F3-439B-81DB-EB0E4E750FFD@vanderbilt.edu> References: <20170920114844.6bf9f27b@osc.edu> <28D10363-A8F3-439B-81DB-EB0E4E750FFD@vanderbilt.edu> Message-ID: <20170920150739.39f0a4a0@osc.edu> So who was the ccrmaster before? What is/was the quorum config? (tiebreaker disks?) what does 'mmccr check' say? Have you set DEBUG=1 and tried mmstartup to see if it teases out any more info from the error? Ed On Wed, 20 Sep 2017 16:27:48 +0000 "Buterbaugh, Kevin L" wrote: > Hi Ed, > > Thanks for the suggestion ? that?s basically what I had done yesterday after > Googling and getting a hit or two on the IBM DeveloperWorks site. I?m > including some output below which seems to show that I?ve got everything set > up but it?s still not working. > > Am I missing something? We don?t use CCR on our production cluster (and this > experience doesn?t make me eager to do so!), so I?m not that familiar with > it... > > Kevin > > /var/mmfs/gen > root at testnsd2# mmdsh -F /tmp/cluster.hostnames "ps -ef | grep mmccr | grep -v > grep" | sort testdellnode1: root 2583 1 0 May30 ? > 00:10:33 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testdellnode1: root 6694 2583 0 11:19 ? > 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testgateway: root 2023 5828 0 11:19 ? > 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testgateway: root 5828 1 0 Sep18 ? > 00:00:19 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd1: > root 19356 4628 0 11:19 tty1 > 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd1: > root 4628 1 0 Sep19 tty1 > 00:00:04 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd2: > root 22149 2983 0 11:16 ? > 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd2: > root 2983 1 0 Sep18 ? > 00:00:27 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd3: > root 15685 6557 0 11:19 ? > 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd3: > root 6557 1 0 Sep19 ? > 00:00:04 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testsched: root 29424 6512 0 11:19 ? > 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testsched: root 6512 1 0 Sep18 ? > 00:00:20 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor > 15 /var/mmfs/gen root at testnsd2# mmstartup -a get file failed: Not enough CCR > quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr > fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous > error messages to determine cause. /var/mmfs/gen root at testnsd2# mmdsh > -F /tmp/cluster.hostnames "ls -l /var/mmfs/ccr" | sort testdellnode1: > drwxr-xr-x 2 root root 4096 Mar 3 2017 cached testdellnode1: drwxr-xr-x 2 > root root 4096 Nov 10 2016 committed testdellnode1: -rw-r--r-- 1 root > root 99 Nov 10 2016 ccr.nodes testdellnode1: total 12 testgateway: > drwxr-xr-x. 2 root root 4096 Jun 29 2016 committed testgateway: drwxr-xr-x. > 2 root root 4096 Mar 3 2017 cached testgateway: -rw-r--r--. 1 root root > 99 Jun 29 2016 ccr.nodes testgateway: total 12 testnsd1: drwxr-xr-x 2 root > root 6 Sep 19 15:38 cached testnsd1: drwxr-xr-x 2 root root 6 Sep 19 15:38 > committed testnsd1: -rw-r--r-- 1 root root 0 Sep 19 15:39 ccr.disks > testnsd1: -rw-r--r-- 1 root root 4 Sep 19 15:38 ccr.noauth testnsd1: > -rw-r--r-- 1 root root 99 Sep 19 15:39 ccr.nodes testnsd1: total 8 > testnsd2: drwxr-xr-x 2 root root 22 Mar 3 2017 cached testnsd2: > drwxr-xr-x 2 root root 4096 Sep 18 11:49 committed testnsd2: -rw------- 1 > root root 4096 Sep 18 11:50 ccr.paxos.1 testnsd2: -rw------- 1 root root > 4096 Sep 18 11:50 ccr.paxos.2 testnsd2: -rw-r--r-- 1 root root 0 Jun 29 > 2016 ccr.disks testnsd2: -rw-r--r-- 1 root root 99 Jun 29 2016 ccr.nodes > testnsd2: total 16 testnsd3: drwxr-xr-x 2 root root 6 Sep 19 15:41 cached > testnsd3: drwxr-xr-x 2 root root 6 Sep 19 15:41 committed testnsd3: > -rw-r--r-- 1 root root 0 Jun 29 2016 ccr.disks testnsd3: -rw-r--r-- 1 root > root 4 Sep 19 15:41 ccr.noauth testnsd3: -rw-r--r-- 1 root root 99 Jun 29 > 2016 ccr.nodes testnsd3: total 8 testsched: drwxr-xr-x. 2 root root 4096 > Jun 29 2016 committed testsched: drwxr-xr-x. 2 root root 4096 Mar 3 2017 > cached testsched: -rw-r--r--. 1 root root 99 Jun 29 2016 ccr.nodes > testsched: total 12 /var/mmfs/gen root at testnsd2# more ../ccr/ccr.nodes > 3,0,10.0.6.215,,testnsd3.vampire > 1,0,10.0.6.213,,testnsd1.vampire > 2,0,10.0.6.214,,testnsd2.vampire > /var/mmfs/gen > root at testnsd2# mmdsh -F /tmp/cluster.hostnames "ls -l /var/mmfs/gen/mmsdrfs" > testnsd1: -rw-r--r-- 1 root root 20360 Sep 19 15:21 /var/mmfs/gen/mmsdrfs > testnsd3: -rw-r--r-- 1 root root 20360 Sep 19 15:34 /var/mmfs/gen/mmsdrfs > testnsd2: -rw-r--r-- 1 root root 20360 Aug 25 17:34 /var/mmfs/gen/mmsdrfs > testdellnode1: -rw-r--r-- 1 root root 20360 Aug 25 > 17:43 /var/mmfs/gen/mmsdrfs testgateway: -rw-r--r--. 1 root root 20360 Aug > 25 17:43 /var/mmfs/gen/mmsdrfs testsched: -rw-r--r--. 1 root root 20360 Aug > 25 17:43 /var/mmfs/gen/mmsdrfs /var/mmfs/gen > root at testnsd2# mmdsh -F /tmp/cluster.hostnames "md5sum /var/mmfs/gen/mmsdrfs" > testnsd1: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs > testnsd3: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs > testnsd2: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs > testdellnode1: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs > testgateway: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs > testsched: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs > /var/mmfs/gen > root at testnsd2# mmdsh -F /tmp/cluster.hostnames > "md5sum /var/mmfs/ssl/stage/genkeyData1" testnsd3: > ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 testnsd1: > ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 testnsd2: > ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 > testdellnode1: > ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 > testgateway: > ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 testsched: > ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 /var/mmfs/gen > root at testnsd2# > > On Sep 20, 2017, at 10:48 AM, Edward Wahl > > wrote: > > I've run into this before. We didn't use to use CCR. And restoring nodes for > us is a major pain in the rear as we only allow one-way root SSH, so we have a > number of useful little scripts to work around problems like this. > > Assuming that you have all the necessary files copied to the correct > places, you can manually kick off CCR. > > I think my script does something like: > > (copy the encryption key info) > > scp /var/mmfs/ccr/ccr.nodes :/var/mmfs/ccr/ > > scp /var/mmfs/gen/mmsdrfs :/var/mmfs/gen/ > > scp /var/mmfs/ssl/stage/genkeyData1 :/var/mmfs/ssl/stage/ > > :/usr/lpp/mmfs/bin/mmcommon startCcrMonitor > > you should then see like 2 copies of it running under mmksh. > > Ed > > > On Wed, 20 Sep 2017 13:55:28 +0000 > "Buterbaugh, Kevin L" > > > wrote: > > Hi All, > > testnsd1 and testnsd3 both had hardware issues (power supply and internal HD > respectively). Given that they were 12 year old boxes, we decided to replace > them with other boxes that are a mere 7 years old ? keep in mind that this is > a test cluster. > > Disabling CCR does not work, even with the undocumented ??force? option: > > /var/mmfs/gen > root at testnsd2# mmchcluster --ccr-disable -p testnsd2 -s testnsd1 --force > mmchcluster: Unable to obtain the GPFS configuration file lock. > mmchcluster: GPFS was unable to obtain a lock from node testnsd1.vampire. > mmchcluster: Processing continues without lock protection. > The authenticity of host 'testnsd3.vampire (10.0.6.215)' can't be established. > ECDSA key fingerprint is SHA256:Ky1pkjsC/kvt4RA8PJuEh/W3vcxCJZplr2m1XHr+UwI. > ECDSA key fingerprint is MD5:55:59:a0:2a:6e:a1:00:58:85:3d:ac:86:0e:cd:2a:8a. > Are you sure you want to continue connecting (yes/no)? The authenticity of > host 'testnsd1.vampire (10.0.6.213)' can't be established. ECDSA key > fingerprint is SHA256:WPiTtyuyzhuv+lRRpgDjLuHpyHyk/W3+c5N9SabWvnE. ECDSA key > fingerprint is MD5:26:26:2a:bf:e4:cb:1d:a8:27:35:96:ef:b5:96:e0:29. Are you > sure you want to continue connecting (yes/no)? The authenticity of host > 'vmp609.vampire (10.0.21.9)' can't be established. ECDSA key fingerprint is > SHA256:/gX6eSp/shsRboVFcUFcNCtGSfbBIWQZ/CWjA6gb17Q. ECDSA key fingerprint is > MD5:ca:4d:58:8c:91:28:25:7b:5b:b1:0d:a3:72:a3:00:bb. Are you sure you want to > continue connecting (yes/no)? The authenticity of host 'vmp608.vampire > (10.0.21.8)' can't be established. ECDSA key fingerprint is > SHA256:tvtNWN9b7/Qknb/Am8x7FzyMngi6R3f5SHBqATNtLzw. ECDSA key fingerprint is > MD5:fc:4e:87:fb:09:82:cd:67:b0:7d:7f:c7:4b:83:b9:6c. Are you sure you want to > continue connecting (yes/no)? The authenticity of host 'vmp612.vampire > (10.0.21.12)' can't be established. ECDSA key fingerprint is > SHA256:zKXqPt8rIMZWSAYavKEuaAVIm31OGVovoWVU+dBTRPM. ECDSA key fingerprint is > MD5:72:4d:fb:22:4e:b3:0e:04:37:be:16:74:ae:ea:05:6c. Are you sure you want to > continue connecting (yes/no)? > root at vmp610.vampire's > password: testnsd3.vampire: Host key verification failed. mmdsh: > testnsd3.vampire remote shell process had return code 255. testnsd1.vampire: > Host key verification failed. mmdsh: testnsd1.vampire remote shell process > had return code 255. vmp609.vampire: Host key verification failed. mmdsh: > vmp609.vampire remote shell process had return code 255. vmp608.vampire: > Host key verification failed. mmdsh: vmp608.vampire remote shell process had > return code 255. vmp612.vampire: Host key verification failed. mmdsh: > vmp612.vampire remote shell process had return code 255. > > root at vmp610.vampire's > password: vmp610.vampire: Permission denied, please try again. > > root at vmp610.vampire's > password: vmp610.vampire: Permission denied, please try again. > > vmp610.vampire: Permission denied > (publickey,gssapi-keyex,gssapi-with-mic,password). mmdsh: vmp610.vampire > remote shell process had return code 255. > > Verifying GPFS is stopped on all nodes ... > The authenticity of host 'testnsd3.vampire (10.0.6.215)' can't be established. > ECDSA key fingerprint is SHA256:Ky1pkjsC/kvt4RA8PJuEh/W3vcxCJZplr2m1XHr+UwI. > ECDSA key fingerprint is MD5:55:59:a0:2a:6e:a1:00:58:85:3d:ac:86:0e:cd:2a:8a. > Are you sure you want to continue connecting (yes/no)? The authenticity of > host 'vmp612.vampire (10.0.21.12)' can't be established. ECDSA key > fingerprint is SHA256:zKXqPt8rIMZWSAYavKEuaAVIm31OGVovoWVU+dBTRPM. ECDSA key > fingerprint is MD5:72:4d:fb:22:4e:b3:0e:04:37:be:16:74:ae:ea:05:6c. Are you > sure you want to continue connecting (yes/no)? The authenticity of host > 'vmp608.vampire (10.0.21.8)' can't be established. ECDSA key fingerprint is > SHA256:tvtNWN9b7/Qknb/Am8x7FzyMngi6R3f5SHBqATNtLzw. ECDSA key fingerprint is > MD5:fc:4e:87:fb:09:82:cd:67:b0:7d:7f:c7:4b:83:b9:6c. Are you sure you want to > continue connecting (yes/no)? The authenticity of host 'vmp609.vampire > (10.0.21.9)' can't be established. ECDSA key fingerprint is > SHA256:/gX6eSp/shsRboVFcUFcNCtGSfbBIWQZ/CWjA6gb17Q. ECDSA key fingerprint is > MD5:ca:4d:58:8c:91:28:25:7b:5b:b1:0d:a3:72:a3:00:bb. Are you sure you want to > continue connecting (yes/no)? The authenticity of host 'testnsd1.vampire > (10.0.6.213)' can't be established. ECDSA key fingerprint is > SHA256:WPiTtyuyzhuv+lRRpgDjLuHpyHyk/W3+c5N9SabWvnE. ECDSA key fingerprint is > MD5:26:26:2a:bf:e4:cb:1d:a8:27:35:96:ef:b5:96:e0:29. Are you sure you want to > continue connecting (yes/no)? > root at vmp610.vampire's > password: > root at vmp610.vampire's > password: > root at vmp610.vampire's > password: > > testnsd3.vampire: Host key verification failed. > mmdsh: testnsd3.vampire remote shell process had return code 255. > vmp612.vampire: Host key verification failed. > mmdsh: vmp612.vampire remote shell process had return code 255. > vmp608.vampire: Host key verification failed. > mmdsh: vmp608.vampire remote shell process had return code 255. > vmp609.vampire: Host key verification failed. > mmdsh: vmp609.vampire remote shell process had return code 255. > testnsd1.vampire: Host key verification failed. > mmdsh: testnsd1.vampire remote shell process had return code 255. > vmp610.vampire: Permission denied, please try again. > vmp610.vampire: Permission denied, please try again. > vmp610.vampire: Permission denied > (publickey,gssapi-keyex,gssapi-with-mic,password). mmdsh: vmp610.vampire > remote shell process had return code 255. mmchcluster: Command failed. > Examine previous error messages to determine cause. /var/mmfs/gen > root at testnsd2# > > I believe that part of the problem may be that there are 4 client nodes that > were removed from the cluster without removing them from the cluster (done by > another SysAdmin who was in a hurry to repurpose those machines). They?re up > and pingable but not reachable by GPFS anymore, which I?m pretty sure is > making things worse. > > Nor does Loic?s suggestion of running mmcommon work (but thanks for the > suggestion!) ? actually the mmcommon part worked, but a subsequent attempt to > start the cluster up failed: > > /var/mmfs/gen > root at testnsd2# mmstartup -a > get file failed: Not enough CCR quorum nodes available (err 809) > gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 > mmstartup: Command failed. Examine previous error messages to determine cause. > /var/mmfs/gen > root at testnsd2# > > Thanks. > > Kevin > > On Sep 19, 2017, at 10:07 PM, IBM Spectrum Scale > > wrote: > > > Hi Kevin, > > Let's me try to understand the problem you have. What's the meaning of node > died here. Are you mean that there are some hardware/OS issue which cannot be > fixed and OS cannot be up anymore? > > I agree with Bob that you can have a try to disable CCR temporally, restore > cluster configuration and enable it again. > > Such as: > > 1. Login to a node which has proper GPFS config, e.g NodeA > 2. Shutdown daemon in all client cluster. > 3. mmchcluster --ccr-disable -p NodeA > 4. mmsdrrestore -a -p NodeA > 5. mmauth genkey propagate -N testnsd1, testnsd3 > 6. mmchcluster --ccr-enable > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fdeveloperworks%2Fcommunity%2Fforums%2Fhtml%2Fforum%3Fid%3D11111111-0000-0000-0000-000000000479&data=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C745cfeaac7264124bb8c08d5003f162a%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636415193316350738&sdata=8OL9COHsb4M%2BZOyWta92acdO8K1Ez8HJfHbrCdDsmRs%3D&reserved=0. > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 in the United States or your local IBM Service Center in other > countries. > > The forum is informally monitored as time permits and should not be used for > priority messages to the Spectrum Scale (GPFS) team. > > "Oesterlin, Robert" ---09/20/2017 07:39:55 AM---OK ? I?ve run > across this before, and it?s because of a bug (as I recall) having to do with > CCR and > > From: "Oesterlin, Robert" > > > To: gpfsug main discussion list > > > Date: 09/20/2017 07:39 AM Subject: Re: [gpfsug-discuss] CCR cluster down for > the count? Sent by: > gpfsug-discuss-bounces at spectrumscale.org > > ________________________________ > > > > OK ? I?ve run across this before, and it?s because of a bug (as I recall) > having to do with CCR and quorum. What I think you can do is set the cluster > to non-ccr (mmchcluster ?ccr-disable) with all the nodes down, bring it back > up and then re-enable ccr. > > I?ll see if I can find this in one of the recent 4.2 release nodes. > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > > From: > > > on behalf of "Buterbaugh, Kevin L" > > > Reply-To: gpfsug main discussion list > > > Date: Tuesday, September 19, 2017 at 4:03 PM To: gpfsug main discussion list > > > Subject: [EXTERNAL] [gpfsug-discuss] CCR cluster down for the count? > > Hi All, > > We have a small test cluster that is CCR enabled. It only had/has 3 NSD > servers (testnsd1, 2, and 3) and maybe 3-6 clients. testnsd3 died a while > back. I did nothing about it at the time because it was due to be life-cycled > as soon as I finished a couple of higher priority projects. > > Yesterday, testnsd1 also died, which took the whole cluster down. So now > resolving this has become higher priority? ;-) > > I took two other boxes and set them up as testnsd1 and 3, respectively. I?ve > done a ?mmsdrrestore -p testnsd2 -R /usr/bin/scp? on both of them. I?ve also > done a "mmccr setup -F? and copied the ccr.disks and ccr.nodes files from > testnsd2 to them. And I?ve copied /var/mmfs/gen/mmsdrfs from testnsd2 to > testnsd1 and 3. In case it?s not obvious from the above, networking is fine ? > ssh without a password between those 3 boxes is fine. > > However, when I try to startup GPFS ? or run any GPFS command I get: > > /root > root at testnsd2# mmstartup -a > get file failed: Not enough CCR quorum nodes available (err 809) > gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 > mmstartup: Command failed. Examine previous error messages to determine cause. > /root > root at testnsd2# > > I?ve got to run to a meeting right now, so I hope I?m not leaving out any > crucial details here ? does anyone have an idea what I need to do? Thanks? > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu > - (615)875-9633 > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at > spectrumscale.org > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttp-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss%26d%3DDwICAg%26c%3Djf_iaSHvJObTbx-siA1ZOg%26r%3DIbxtjdkPAM2Sbon4Lbbi4w%26m%3DmBSa534LB4C2zN59ZsJSlginQqfcrutinpAPYNDqU_Y%26s%3DYJEapknqzE2d9kwZzZuu6gEW0DzBoM-o94pXGEeCfuI%26e&data=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C745cfeaac7264124bb8c08d5003f162a%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636415193316350738&sdata=oQ4u%2BdyyYLY7HzaOqRPEGjUVhi7AQF%2BvbvnWA4bhuXE%3D&reserved=0= > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at > spectrumscale.org > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C494f0469ec084568b39608d4ffd4b8c2%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636414736486816768&sdata=kBvEL7Kp2JMGuLIL4NX3UV7h3emaayQSbHr8O1F2CXc%3D&reserved=0 > > > > > -- > > Ed Wahl > Ohio Supercomputer Center > 614-292-9302 > > > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu - > (615)875-9633 > > > -- Ed Wahl Ohio Supercomputer Center 614-292-9302 From tarak.patel at canada.ca Wed Sep 20 21:23:00 2017 From: tarak.patel at canada.ca (Patel, Tarak (SSC/SPC)) Date: Wed, 20 Sep 2017 20:23:00 +0000 Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication In-Reply-To: References: <27469.1500914134@turing-police.cc.vt.edu> , , Message-ID: Hi, Recently we deployed 3 sets of CES nodes where we are using LDAP for authentication service. We had to create a user in ldap which was used by 'mmuserauth service create' command. Note that SMB needs to be disabled ('mmces service disable smb') if not being used before issuing 'mmuserauth service create'. By default, CES deployment enables SMB (' spectrumscale config protocols'). Tarak -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jonathon A Anderson Sent: 20 September, 2017 14:55 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication I shouldn't need SMB for authentication if I'm only using userdefined authentication, though. ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Sobey, Richard A Sent: Wednesday, September 20, 2017 2:23:37 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication This sounded familiar to a problem I had to do with SMB and NFS. I've looked, and it's a different problem, but at the time I had this response. "That would be the case when Active Directory is configured for authentication. In that case the SMB service includes two aspects: One is the actual SMB file server, and the second one is the service for the Active Directory integration. Since NFS depends on authentication and id mapping services, it requires SMB to be running." I suspect the last paragraph is relevant in your case. HTH Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jonathon A Anderson Sent: 20 September 2017 06:13 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Returning to this thread cause I'm having the same issue as Ilan, above. I'm working on setting up CES in our environment after finally getting a blocking bugfix applied. I'm making it further now, but I'm getting an error when I try to create my export: --- [root at sgate2 ~]# mmnfs export add /gpfs/summit/scratch --client 'login*.rc.int.colorado.edu(rw,root_squash);dtn*.rc.int.colorado.edu(rw,root_squash)' mmcesfuncs.sh: Current authentication: none is invalid. This operation can not be completed without correct Authentication configuration. Configure authentication using: mmuserauth mmnfs export add: Command failed. Examine previous error messages to determine cause. --- When I try to configure mmuserauth, I get an error about not having SMB active; but I don't want to configure SMB, only NFS. --- [root at sgate2 ~]# /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. --- How can I configure NFS exports with mmnfs without having to enable SMB? ~jonathon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Varun Mittal3 Sent: Tuesday, July 25, 2017 9:44:24 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sorry a small typo: mmuserauth service create --data-access-method file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune [Inactive hide details for Varun Mittal3---26/07/2017 09:12:27 AM---Hi Did you try to run this command from a CES designated nod]Varun Mittal3---26/07/2017 09:12:27 AM---Hi Did you try to run this command from a CES designated node ? From: Varun Mittal3/India/IBM To: gpfsug main discussion list Date: 26/07/2017 09:12 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication ________________________________ Hi Did you try to run this command from a CES designated node ? If no, then try executing the command from a CES node: mmuserauth service create --data-access-type file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune [Inactive hide details for Ilan Schwarts ---25/07/2017 10:22:26 AM---Hi, While trying to add the userdefined auth, I receive err]Ilan Schwarts ---25/07/2017 10:22:26 AM---Hi, While trying to add the userdefined auth, I receive error that SMB From: Ilan Schwarts To: gpfsug main discussion list Date: 25/07/2017 10:22 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi, While trying to add the userdefined auth, I receive error that SMB service not enabled. I am currently working on a spectrum scale cluster, and i dont have the SMB package, I am waiting for it.. is there a way to export NFSv3 using the spectrum scale tools without SMB package ? [root at LH20-GPFS1 ~]# mmuserauth service create --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. I exported the NFS via /etc/exports and than ./exportfs -a .. It works fine, I was able to mount the gpfs export from another machine.. this was my work-around since the spectrum scale tools failed to export NFSv3 On Mon, Jul 24, 2017 at 7:35 PM, wrote: > On Mon, 24 Jul 2017 13:36:41 +0300, Ilan Schwarts said: >> Hi, >> I have gpfs with 2 Nodes (redhat). >> I am trying to create NFS share - So I would be able to mount and >> access it from another linux machine. > >> While trying to create NFS (I execute the following): >> [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "* >> Access_Type=RW,Protocols=3:4,Squash=no_root_squash)" > > You can get away with little to no authentication for NFSv3, but not > for NFSv4. Try with Protocols=3 only and > > mmuserauth service create --type userdefined > > that should get you Unix-y NFSv3 UID/GID support and "trust what the > NFS client tells you". This of course only works sanely if each NFS > export is only to a set of machines in the same administrative domain > that manages their UID/GIDs. Exporting to two sets of machines that > don't coordinate their UID/GID space is, of course, where hilarity and hijinks ensue.... > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://secure-web.cisco.com/1w-ldlm8bq5oYiMuHk7N1T32DW18VkxjnfkMWDjdpi > Bv1WJToz9PCO1zVyGvWIVP3-fNVoZ49ioTlOwQoRbyC_MjpoBPlD3jfpV_knuzViyRNZiy > liSGH9rx5nGVvTLSPrjIwzvUIZDadCuNXgM_jYCVBE2RsDpg8o4LCjJv9QIZPbyHlKrkoQ > 0sNGXOZPYT7gxpo8sVjoxKQbOgQzkDnPMQoa2a8miTP19fLkB5HqV5cJv3U-Vs_qLtyJGV > srSgLu2wQoDMxymVwm5mcRWO6MYfl4_MtVXKzQRwQqemODDjSa5my7zl98vobN_ui-cRwC > YeVbOwEd57CjaYRzKcBu6Dbd2TmGar7JUNWVtg1dZPTv6uothD6V4g0Q0MuXZsBICzfxbj > XI9WlB3Tiu3ty0oxenYrM8yxE-Cl57VhmV4KlY18EHMFncfLtRkk9cTHtfrEjiXBROhCuv > EeqhrYT6A/http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discus > s > -- - Ilan Schwarts _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://secure-web.cisco.com/1w-ldlm8bq5oYiMuHk7N1T32DW18VkxjnfkMWDjdpiBv1WJToz9PCO1zVyGvWIVP3-fNVoZ49ioTlOwQoRbyC_MjpoBPlD3jfpV_knuzViyRNZiyliSGH9rx5nGVvTLSPrjIwzvUIZDadCuNXgM_jYCVBE2RsDpg8o4LCjJv9QIZPbyHlKrkoQ0sNGXOZPYT7gxpo8sVjoxKQbOgQzkDnPMQoa2a8miTP19fLkB5HqV5cJv3U-Vs_qLtyJGVsrSgLu2wQoDMxymVwm5mcRWO6MYfl4_MtVXKzQRwQqemODDjSa5my7zl98vobN_ui-cRwCYeVbOwEd57CjaYRzKcBu6Dbd2TmGar7JUNWVtg1dZPTv6uothD6V4g0Q0MuXZsBICzfxbjXI9WlB3Tiu3ty0oxenYrM8yxE-Cl57VhmV4KlY18EHMFncfLtRkk9cTHtfrEjiXBROhCuvEeqhrYT6A/http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://secure-web.cisco.com/1w-ldlm8bq5oYiMuHk7N1T32DW18VkxjnfkMWDjdpiBv1WJToz9PCO1zVyGvWIVP3-fNVoZ49ioTlOwQoRbyC_MjpoBPlD3jfpV_knuzViyRNZiyliSGH9rx5nGVvTLSPrjIwzvUIZDadCuNXgM_jYCVBE2RsDpg8o4LCjJv9QIZPbyHlKrkoQ0sNGXOZPYT7gxpo8sVjoxKQbOgQzkDnPMQoa2a8miTP19fLkB5HqV5cJv3U-Vs_qLtyJGVsrSgLu2wQoDMxymVwm5mcRWO6MYfl4_MtVXKzQRwQqemODDjSa5my7zl98vobN_ui-cRwCYeVbOwEd57CjaYRzKcBu6Dbd2TmGar7JUNWVtg1dZPTv6uothD6V4g0Q0MuXZsBICzfxbjXI9WlB3Tiu3ty0oxenYrM8yxE-Cl57VhmV4KlY18EHMFncfLtRkk9cTHtfrEjiXBROhCuvEeqhrYT6A/http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://secure-web.cisco.com/1w-ldlm8bq5oYiMuHk7N1T32DW18VkxjnfkMWDjdpiBv1WJToz9PCO1zVyGvWIVP3-fNVoZ49ioTlOwQoRbyC_MjpoBPlD3jfpV_knuzViyRNZiyliSGH9rx5nGVvTLSPrjIwzvUIZDadCuNXgM_jYCVBE2RsDpg8o4LCjJv9QIZPbyHlKrkoQ0sNGXOZPYT7gxpo8sVjoxKQbOgQzkDnPMQoa2a8miTP19fLkB5HqV5cJv3U-Vs_qLtyJGVsrSgLu2wQoDMxymVwm5mcRWO6MYfl4_MtVXKzQRwQqemODDjSa5my7zl98vobN_ui-cRwCYeVbOwEd57CjaYRzKcBu6Dbd2TmGar7JUNWVtg1dZPTv6uothD6V4g0Q0MuXZsBICzfxbjXI9WlB3Tiu3ty0oxenYrM8yxE-Cl57VhmV4KlY18EHMFncfLtRkk9cTHtfrEjiXBROhCuvEeqhrYT6A/http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From chetkulk at in.ibm.com Thu Sep 21 06:33:53 2017 From: chetkulk at in.ibm.com (Chetan R Kulkarni) Date: Thu, 21 Sep 2017 11:03:53 +0530 Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication In-Reply-To: References: <27469.1500914134@turing-police.cc.vt.edu>, , Message-ID: Hi Jonathon, I can configure file userdefined authentication with only NFS enabled/running on my test setup (SMB was disabled). Please check if following steps help fix your issue: 1> remove existing file auth if any /usr/lpp/mmfs/bin/mmuserauth service remove --data-access-method file 2> disable smb service /usr/lpp/mmfs/bin/mmces service disable smb /usr/lpp/mmfs/bin/mmces service list -a 3> configure userdefined file auth /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type userdefined 4> if above fails retry mmuserauth in debug mode as below and please share error log /tmp/userdefined.log. Also share spectrum scale version you are running with. export DEBUG=1; /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type userdefined > /tmp/userdefined.log 2>&1; unset DEBUG /usr/lpp/mmfs/bin/mmdiag --version 5> if mmuserauth succeeds in step 3> above; you also need to correct your mmnfs cli command as below. You missed to type in Access_Type= and Squash= in client definition. mmnfs export add /gpfs/summit/scratch --client 'login*.rc.int.colorado.edu (Access_Type=rw,Squash=root_squash);dtn*.rc.int.colorado.edu (Access_Type=rw,Squash=root_squash)' Thanks, Chetan. From: Jonathon A Anderson To: gpfsug main discussion list Date: 09/21/2017 12:25 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sent by: gpfsug-discuss-bounces at spectrumscale.org I shouldn't need SMB for authentication if I'm only using userdefined authentication, though. ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Sobey, Richard A Sent: Wednesday, September 20, 2017 2:23:37 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication This sounded familiar to a problem I had to do with SMB and NFS. I've looked, and it's a different problem, but at the time I had this response. "That would be the case when Active Directory is configured for authentication. In that case the SMB service includes two aspects: One is the actual SMB file server, and the second one is the service for the Active Directory integration. Since NFS depends on authentication and id mapping services, it requires SMB to be running." I suspect the last paragraph is relevant in your case. HTH Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jonathon A Anderson Sent: 20 September 2017 06:13 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Returning to this thread cause I'm having the same issue as Ilan, above. I'm working on setting up CES in our environment after finally getting a blocking bugfix applied. I'm making it further now, but I'm getting an error when I try to create my export: --- [root at sgate2 ~]# mmnfs export add /gpfs/summit/scratch --client 'login*.rc.int.colorado.edu(rw,root_squash);dtn*.rc.int.colorado.edu (rw,root_squash)' mmcesfuncs.sh: Current authentication: none is invalid. This operation can not be completed without correct Authentication configuration. Configure authentication using: mmuserauth mmnfs export add: Command failed. Examine previous error messages to determine cause. --- When I try to configure mmuserauth, I get an error about not having SMB active; but I don't want to configure SMB, only NFS. --- [root at sgate2 ~]# /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. --- How can I configure NFS exports with mmnfs without having to enable SMB? ~jonathon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Varun Mittal3 Sent: Tuesday, July 25, 2017 9:44:24 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sorry a small typo: mmuserauth service create --data-access-method file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune [Inactive hide details for Varun Mittal3---26/07/2017 09:12:27 AM---Hi Did you try to run this command from a CES designated nod]Varun Mittal3---26/07/2017 09:12:27 AM---Hi Did you try to run this command from a CES designated node ? From: Varun Mittal3/India/IBM To: gpfsug main discussion list Date: 26/07/2017 09:12 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication ________________________________ Hi Did you try to run this command from a CES designated node ? If no, then try executing the command from a CES node: mmuserauth service create --data-access-type file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune [Inactive hide details for Ilan Schwarts ---25/07/2017 10:22:26 AM---Hi, While trying to add the userdefined auth, I receive err]Ilan Schwarts ---25/07/2017 10:22:26 AM---Hi, While trying to add the userdefined auth, I receive error that SMB From: Ilan Schwarts To: gpfsug main discussion list Date: 25/07/2017 10:22 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi, While trying to add the userdefined auth, I receive error that SMB service not enabled. I am currently working on a spectrum scale cluster, and i dont have the SMB package, I am waiting for it.. is there a way to export NFSv3 using the spectrum scale tools without SMB package ? [root at LH20-GPFS1 ~]# mmuserauth service create --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. I exported the NFS via /etc/exports and than ./exportfs -a .. It works fine, I was able to mount the gpfs export from another machine.. this was my work-around since the spectrum scale tools failed to export NFSv3 On Mon, Jul 24, 2017 at 7:35 PM, wrote: > On Mon, 24 Jul 2017 13:36:41 +0300, Ilan Schwarts said: >> Hi, >> I have gpfs with 2 Nodes (redhat). >> I am trying to create NFS share - So I would be able to mount and >> access it from another linux machine. > >> While trying to create NFS (I execute the following): >> [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "* >> Access_Type=RW,Protocols=3:4,Squash=no_root_squash)" > > You can get away with little to no authentication for NFSv3, but > not for NFSv4. Try with Protocols=3 only and > > mmuserauth service create --type userdefined > > that should get you Unix-y NFSv3 UID/GID support and "trust what the NFS > client tells you". This of course only works sanely if each NFS export is > only to a set of machines in the same administrative domain that manages their > UID/GIDs. Exporting to two sets of machines that don't coordinate their > UID/GID space is, of course, where hilarity and hijinks ensue.... > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__secure-2Dweb.cisco.com_1w-2Dldlm8bq5oYiMuHk7N1T32DW18VkxjnfkMWDjdpiBv1WJToz9PCO1zVyGvWIVP3-2DfNVoZ49ioTlOwQoRbyC-5FMjpoBPlD3jfpV-5FknuzViyRNZiyliSGH9rx5nGVvTLSPrjIwzvUIZDadCuNXgM-5FjYCVBE2RsDpg8o4LCjJv9QIZPbyHlKrkoQ0sNGXOZPYT7gxpo8sVjoxKQbOgQzkDnPMQoa2a8miTP19fLkB5HqV5cJv3U-2DVs-5FqLtyJGVsrSgLu2wQoDMxymVwm5mcRWO6MYfl4-5FMtVXKzQRwQqemODDjSa5my7zl98vobN-5Fui-2DcRwCYeVbOwEd57CjaYRzKcBu6Dbd2TmGar7JUNWVtg1dZPTv6uothD6V4g0Q0MuXZsBICzfxbjXI9WlB3Tiu3ty0oxenYrM8yxE-2DCl57VhmV4KlY18EHMFncfLtRkk9cTHtfrEjiXBROhCuvEeqhrYT6A_http-253A-252F-252Fgpfsug.org-252Fmailman-252Flistinfo-252Fgpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=uic-29lyJ5TCiTRi0FyznYhKJx5I7Vzu80WyYuZ4_iM&m=VqyIekg3Wtz0ukw-QSXsEXOoi5rZ0gnMeIPyFNGpllA&s=DNgplGZ30awqnvnd4Ju39pzv3rlk18Kf6NGe7iDX4Mk&e= > -- - Ilan Schwarts _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__secure-2Dweb.cisco.com_1w-2Dldlm8bq5oYiMuHk7N1T32DW18VkxjnfkMWDjdpiBv1WJToz9PCO1zVyGvWIVP3-2DfNVoZ49ioTlOwQoRbyC-5FMjpoBPlD3jfpV-5FknuzViyRNZiyliSGH9rx5nGVvTLSPrjIwzvUIZDadCuNXgM-5FjYCVBE2RsDpg8o4LCjJv9QIZPbyHlKrkoQ0sNGXOZPYT7gxpo8sVjoxKQbOgQzkDnPMQoa2a8miTP19fLkB5HqV5cJv3U-2DVs-5FqLtyJGVsrSgLu2wQoDMxymVwm5mcRWO6MYfl4-5FMtVXKzQRwQqemODDjSa5my7zl98vobN-5Fui-2DcRwCYeVbOwEd57CjaYRzKcBu6Dbd2TmGar7JUNWVtg1dZPTv6uothD6V4g0Q0MuXZsBICzfxbjXI9WlB3Tiu3ty0oxenYrM8yxE-2DCl57VhmV4KlY18EHMFncfLtRkk9cTHtfrEjiXBROhCuvEeqhrYT6A_http-253A-252F-252Fgpfsug.org-252Fmailman-252Flistinfo-252Fgpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=uic-29lyJ5TCiTRi0FyznYhKJx5I7Vzu80WyYuZ4_iM&m=VqyIekg3Wtz0ukw-QSXsEXOoi5rZ0gnMeIPyFNGpllA&s=DNgplGZ30awqnvnd4Ju39pzv3rlk18Kf6NGe7iDX4Mk&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__secure-2Dweb.cisco.com_1w-2Dldlm8bq5oYiMuHk7N1T32DW18VkxjnfkMWDjdpiBv1WJToz9PCO1zVyGvWIVP3-2DfNVoZ49ioTlOwQoRbyC-5FMjpoBPlD3jfpV-5FknuzViyRNZiyliSGH9rx5nGVvTLSPrjIwzvUIZDadCuNXgM-5FjYCVBE2RsDpg8o4LCjJv9QIZPbyHlKrkoQ0sNGXOZPYT7gxpo8sVjoxKQbOgQzkDnPMQoa2a8miTP19fLkB5HqV5cJv3U-2DVs-5FqLtyJGVsrSgLu2wQoDMxymVwm5mcRWO6MYfl4-5FMtVXKzQRwQqemODDjSa5my7zl98vobN-5Fui-2DcRwCYeVbOwEd57CjaYRzKcBu6Dbd2TmGar7JUNWVtg1dZPTv6uothD6V4g0Q0MuXZsBICzfxbjXI9WlB3Tiu3ty0oxenYrM8yxE-2DCl57VhmV4KlY18EHMFncfLtRkk9cTHtfrEjiXBROhCuvEeqhrYT6A_http-253A-252F-252Fgpfsug.org-252Fmailman-252Flistinfo-252Fgpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=uic-29lyJ5TCiTRi0FyznYhKJx5I7Vzu80WyYuZ4_iM&m=VqyIekg3Wtz0ukw-QSXsEXOoi5rZ0gnMeIPyFNGpllA&s=DNgplGZ30awqnvnd4Ju39pzv3rlk18Kf6NGe7iDX4Mk&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__secure-2Dweb.cisco.com_1w-2Dldlm8bq5oYiMuHk7N1T32DW18VkxjnfkMWDjdpiBv1WJToz9PCO1zVyGvWIVP3-2DfNVoZ49ioTlOwQoRbyC-5FMjpoBPlD3jfpV-5FknuzViyRNZiyliSGH9rx5nGVvTLSPrjIwzvUIZDadCuNXgM-5FjYCVBE2RsDpg8o4LCjJv9QIZPbyHlKrkoQ0sNGXOZPYT7gxpo8sVjoxKQbOgQzkDnPMQoa2a8miTP19fLkB5HqV5cJv3U-2DVs-5FqLtyJGVsrSgLu2wQoDMxymVwm5mcRWO6MYfl4-5FMtVXKzQRwQqemODDjSa5my7zl98vobN-5Fui-2DcRwCYeVbOwEd57CjaYRzKcBu6Dbd2TmGar7JUNWVtg1dZPTv6uothD6V4g0Q0MuXZsBICzfxbjXI9WlB3Tiu3ty0oxenYrM8yxE-2DCl57VhmV4KlY18EHMFncfLtRkk9cTHtfrEjiXBROhCuvEeqhrYT6A_http-253A-252F-252Fgpfsug.org-252Fmailman-252Flistinfo-252Fgpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=uic-29lyJ5TCiTRi0FyznYhKJx5I7Vzu80WyYuZ4_iM&m=VqyIekg3Wtz0ukw-QSXsEXOoi5rZ0gnMeIPyFNGpllA&s=DNgplGZ30awqnvnd4Ju39pzv3rlk18Kf6NGe7iDX4Mk&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=uic-29lyJ5TCiTRi0FyznYhKJx5I7Vzu80WyYuZ4_iM&m=VqyIekg3Wtz0ukw-QSXsEXOoi5rZ0gnMeIPyFNGpllA&s=AliY037R_W1y8Ym6nPI1XDP2yCq47JwtTPhj9IppwOM&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From andreas.mattsson at maxiv.lu.se Thu Sep 21 13:09:29 2017 From: andreas.mattsson at maxiv.lu.se (Andreas Mattsson) Date: Thu, 21 Sep 2017 12:09:29 +0000 Subject: [gpfsug-discuss] Strange timestamp behaviour on NFS via CES In-Reply-To: References: , Message-ID: <10541a8ed07149ecafdbe9ac03b807b8@maxiv.lu.se> Since I solved this old issue a long time ago, I'd thought I'd come back and report the solution in case someone else encounters similar problems in the future. Original problem reported by users: Copying files between folders on NFS exports from a CES server gave random timestamps on the files. Also, apart from the initial reported problem, there where issues where users sometimes couldn't change or delete files that they where owners of. Background: We have a Active Directory with RFC2307 posix attributes populated, and use the built in Winbind-based AD authentication with RFC2307 ID mapping of our Spectrum Scale CES protocol servers. All our Linux clients and servers are also AD integrated, using Nslcd and nss-pam-ldapd. Trigger: If a user was part of a AD group with a mixed case name, and this group gave access to a folder, and the NFS mount was done using NFSv4, the behavior in my original post occurred when copying or changing files in that folder. Cause: Active Directory handle LDAP-requests case insensitive, but results are returned with case retained. Winbind and SSSD-AD converts groups and usernames to lower case. Nslcd retains case. We run NFS with managed GIDs. Managed GIDs in NFSv3 seems to be handled case insensitive, or to ignore the actual group name after it has resolved the GID-number of the group, while NFSv4 seems to handle group names case sensitive and check the actual group name for certain operations even if the GID-number matches. Don't fully understand the mechanism behind why certain file operations would work but others not, but in essence a user would be part of a group called "UserGroup" with GID-number 1234 in AD and on the client, but would be part of a group called "usergroup" with GID-number 1234 on the CES server. Any operation that's authorized on the GID-number, or a case insensitive lookup of the group name, would work. Any operation authorized by a case sensitive group lookup would fail. Three different workarounds where found to work: 1. Rename groups and users to lower case in AD 2. Change from Nslcd to either SSSD or Winbind on the clients 3. Change from NFSv4 to NFSv3 when mounting NFS Remember to clear ID-mapping caches. Regards, Andreas ___________________________________ [https://mail.google.com/mail/u/0/?ui=2&ik=b0a6f02971&view=att&th=14618fab2daf0e10&attid=0.1.1&disp=emb&zw&atsh=1] Andreas Mattsson System Engineer MAX IV Laboratory Lund University Tel: +46-706-649544 E-mail: andreas.mattsson at maxlab.lu.se ________________________________ Fr?n: gpfsug-discuss-bounces at spectrumscale.org f?r Stephen Ulmer Skickat: den 3 februari 2017 14:35:21 Till: gpfsug main discussion list ?mne: Re: [gpfsug-discuss] Strange timestamp behaviour on NFS via CES Does the cp actually complete? As in, does it copy all of the blocks? What?s the exit code? A cp?d file should have ?new? metadata. That is, it should have it?s own dates, owners, etc. (not necessarily copied from the source file). I ran ?strace cp foo1 foo2?, and it was pretty instructive, maybe that would get you more info. On CentOS strace is in it?s own package, YMMV. -- Stephen On Feb 3, 2017, at 8:19 AM, Andreas Mattsson > wrote: That works. ?touch test100? Feb 3 14:16 test100 ?cp test100 test101? Feb 3 14:16 test100 Apr 21 2027 test101 ?touch ?r test100 test101? Feb 3 14:16 test100 Feb 3 14:16 test101 /Andreas That?s a cool one. :) What if you use the "random date" file as a time reference to touch another file (like, 'touch -r file02 file03?)? -- Stephen On Feb 3, 2017, at 7:46 AM, Andreas Mattsson > wrote: I?m having some really strange timestamp behaviour when doing file operations on NFS mounts shared via CES on spectrum scale 4.2.1.1 The NFS clients are up to date Centos and Debian machines. All Scale servers and NFS clients have correct date and time via NTP. Creating a file, for instance ?touch file00?, gives correct timestamp. Moving the file, ?mv file00 file01?, gives correct timestamp Copying the file, ?cp file01 file02?, gives a random timestamp anywhere in time, for instance Oct 12 2095 or Feb 29 1976 or something similar. This is only via NFS. Copying the file via a native gpfs-mount or via SMB gives a correct timestamp. Doing the same operation over NFS to other NFS-servers works correct, it is only when operating on the NFS-share from the Spectrum Scale CES the issue occurs. Have anyone seen this before? Regards, Andreas Mattsson _____________________________________________ Andreas Mattsson Systems Engineer MAX IV Laboratory Lund University P.O. Box 118, SE-221 00 Lund, Sweden Visiting address: Fotongatan 2, 225 94 Lund Mobile: +46 706 64 95 44 andreas.mattsson at maxiv.se www.maxiv.se _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From taylorm at us.ibm.com Thu Sep 21 15:33:00 2017 From: taylorm at us.ibm.com (Michael L Taylor) Date: Thu, 21 Sep 2017 07:33:00 -0700 Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication In-Reply-To: References: Message-ID: Hi Jonathon, We were able to run this scenario successfully in our lab at the latest released 4.2.3.4. # /usr/lpp/mmfs/bin/mmdiag --version === mmdiag: version === Current GPFS build: "4.2.3.4 ". # /usr/lpp/mmfs/bin/mmces service list -a Enabled services: NFS node1.test.ibm.com: NFS is running # /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type userdefined File authentication configuration completed successfully. # rpm -qa | grep gpfs gpfs.ext-4.2.3-4.x86_64 gpfs.docs-4.2.3-4.noarch gpfs.gskit-8.0.50-75.x86_64 gpfs.gpl-4.2.3-4.noarch gpfs.msg.en_US-4.2.3-4.noarch nfs-ganesha-gpfs-2.3.2-0.ibm47.el7.x86_64 gpfs.base-4.2.3-4.x86_64 # rpm -qa | grep nfs-gan nfs-ganesha-utils-2.3.2-0.ibm47.el7.x86_64 nfs-ganesha-2.3.2-0.ibm47.el7.x86_64 nfs-ganesha-gpfs-2.3.2-0.ibm47.el7.x86_64 From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 09/20/2017 12:07 PM Subject: gpfsug-discuss Digest, Vol 68, Issue 42 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=BpVUgvFT2Qwgw0hveEgQaHFwn2mjeQjeBrkXHX_aC0A&m=2oGcWc1xx6zOclryoU2BdJykABuIR118zXTmSAA8msU&s=7q0JMYVHMSGlUAYquNMlrDRF6BDj6-76Oc4VbXrvlHE&e= or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: export nfs share on gpfs with no authentication (Jonathon A Anderson) ---------------------------------------------------------------------- Message: 1 Date: Wed, 20 Sep 2017 18:55:04 +0000 From: Jonathon A Anderson To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Message-ID: Content-Type: text/plain; charset="us-ascii" I shouldn't need SMB for authentication if I'm only using userdefined authentication, though. ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Sobey, Richard A Sent: Wednesday, September 20, 2017 2:23:37 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication This sounded familiar to a problem I had to do with SMB and NFS. I've looked, and it's a different problem, but at the time I had this response. "That would be the case when Active Directory is configured for authentication. In that case the SMB service includes two aspects: One is the actual SMB file server, and the second one is the service for the Active Directory integration. Since NFS depends on authentication and id mapping services, it requires SMB to be running." I suspect the last paragraph is relevant in your case. HTH Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jonathon A Anderson Sent: 20 September 2017 06:13 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Returning to this thread cause I'm having the same issue as Ilan, above. I'm working on setting up CES in our environment after finally getting a blocking bugfix applied. I'm making it further now, but I'm getting an error when I try to create my export: --- [root at sgate2 ~]# mmnfs export add /gpfs/summit/scratch --client 'login*.rc.int.colorado.edu(rw,root_squash);dtn*.rc.int.colorado.edu (rw,root_squash)' mmcesfuncs.sh: Current authentication: none is invalid. This operation can not be completed without correct Authentication configuration. Configure authentication using: mmuserauth mmnfs export add: Command failed. Examine previous error messages to determine cause. --- When I try to configure mmuserauth, I get an error about not having SMB active; but I don't want to configure SMB, only NFS. --- [root at sgate2 ~]# /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. --- How can I configure NFS exports with mmnfs without having to enable SMB? ~jonathon -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Sep 21 18:09:52 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 21 Sep 2017 17:09:52 +0000 Subject: [gpfsug-discuss] CCR cluster down for the count? In-Reply-To: <20170920150739.39f0a4a0@osc.edu> References: <20170920114844.6bf9f27b@osc.edu> <28D10363-A8F3-439B-81DB-EB0E4E750FFD@vanderbilt.edu> <20170920150739.39f0a4a0@osc.edu> Message-ID: Hi All, Ralf Eberhard of IBM helped me resolve this off list. The key was to temporarily make testnsd1 and testnsd3 not be quorum nodes by making sure GPFS was down and then executing: mmchnode --nonquorum -N testnsd1,testnsd3 --force That gave me some scary messages about overriding normal GPFS quorum semantics, but nce that was done I was able to run an ?mmstartup -a? and bring up the cluster! Once it was up and I had verified things were working properly I then shut it back down so that I could rerun the mmchnode (without the ?force) to make testnsd1 and testnsd3 quorum nodes again. Thanks to all who helped me out here? Kevin On Sep 20, 2017, at 2:07 PM, Edward Wahl > wrote: So who was the ccrmaster before? What is/was the quorum config? (tiebreaker disks?) what does 'mmccr check' say? Have you set DEBUG=1 and tried mmstartup to see if it teases out any more info from the error? Ed On Wed, 20 Sep 2017 16:27:48 +0000 "Buterbaugh, Kevin L" > wrote: Hi Ed, Thanks for the suggestion ? that?s basically what I had done yesterday after Googling and getting a hit or two on the IBM DeveloperWorks site. I?m including some output below which seems to show that I?ve got everything set up but it?s still not working. Am I missing something? We don?t use CCR on our production cluster (and this experience doesn?t make me eager to do so!), so I?m not that familiar with it... Kevin /var/mmfs/gen root at testnsd2# mmdsh -F /tmp/cluster.hostnames "ps -ef | grep mmccr | grep -v grep" | sort testdellnode1: root 2583 1 0 May30 ? 00:10:33 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testdellnode1: root 6694 2583 0 11:19 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testgateway: root 2023 5828 0 11:19 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testgateway: root 5828 1 0 Sep18 ? 00:00:19 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd1: root 19356 4628 0 11:19 tty1 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd1: root 4628 1 0 Sep19 tty1 00:00:04 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd2: root 22149 2983 0 11:16 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd2: root 2983 1 0 Sep18 ? 00:00:27 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd3: root 15685 6557 0 11:19 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd3: root 6557 1 0 Sep19 ? 00:00:04 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testsched: root 29424 6512 0 11:19 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testsched: root 6512 1 0 Sep18 ? 00:00:20 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 /var/mmfs/gen root at testnsd2# mmstartup -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous error messages to determine cause. /var/mmfs/gen root at testnsd2# mmdsh -F /tmp/cluster.hostnames "ls -l /var/mmfs/ccr" | sort testdellnode1: drwxr-xr-x 2 root root 4096 Mar 3 2017 cached testdellnode1: drwxr-xr-x 2 root root 4096 Nov 10 2016 committed testdellnode1: -rw-r--r-- 1 root root 99 Nov 10 2016 ccr.nodes testdellnode1: total 12 testgateway: drwxr-xr-x. 2 root root 4096 Jun 29 2016 committed testgateway: drwxr-xr-x. 2 root root 4096 Mar 3 2017 cached testgateway: -rw-r--r--. 1 root root 99 Jun 29 2016 ccr.nodes testgateway: total 12 testnsd1: drwxr-xr-x 2 root root 6 Sep 19 15:38 cached testnsd1: drwxr-xr-x 2 root root 6 Sep 19 15:38 committed testnsd1: -rw-r--r-- 1 root root 0 Sep 19 15:39 ccr.disks testnsd1: -rw-r--r-- 1 root root 4 Sep 19 15:38 ccr.noauth testnsd1: -rw-r--r-- 1 root root 99 Sep 19 15:39 ccr.nodes testnsd1: total 8 testnsd2: drwxr-xr-x 2 root root 22 Mar 3 2017 cached testnsd2: drwxr-xr-x 2 root root 4096 Sep 18 11:49 committed testnsd2: -rw------- 1 root root 4096 Sep 18 11:50 ccr.paxos.1 testnsd2: -rw------- 1 root root 4096 Sep 18 11:50 ccr.paxos.2 testnsd2: -rw-r--r-- 1 root root 0 Jun 29 2016 ccr.disks testnsd2: -rw-r--r-- 1 root root 99 Jun 29 2016 ccr.nodes testnsd2: total 16 testnsd3: drwxr-xr-x 2 root root 6 Sep 19 15:41 cached testnsd3: drwxr-xr-x 2 root root 6 Sep 19 15:41 committed testnsd3: -rw-r--r-- 1 root root 0 Jun 29 2016 ccr.disks testnsd3: -rw-r--r-- 1 root root 4 Sep 19 15:41 ccr.noauth testnsd3: -rw-r--r-- 1 root root 99 Jun 29 2016 ccr.nodes testnsd3: total 8 testsched: drwxr-xr-x. 2 root root 4096 Jun 29 2016 committed testsched: drwxr-xr-x. 2 root root 4096 Mar 3 2017 cached testsched: -rw-r--r--. 1 root root 99 Jun 29 2016 ccr.nodes testsched: total 12 /var/mmfs/gen root at testnsd2# more ../ccr/ccr.nodes 3,0,10.0.6.215,,testnsd3.vampire 1,0,10.0.6.213,,testnsd1.vampire 2,0,10.0.6.214,,testnsd2.vampire /var/mmfs/gen root at testnsd2# mmdsh -F /tmp/cluster.hostnames "ls -l /var/mmfs/gen/mmsdrfs" testnsd1: -rw-r--r-- 1 root root 20360 Sep 19 15:21 /var/mmfs/gen/mmsdrfs testnsd3: -rw-r--r-- 1 root root 20360 Sep 19 15:34 /var/mmfs/gen/mmsdrfs testnsd2: -rw-r--r-- 1 root root 20360 Aug 25 17:34 /var/mmfs/gen/mmsdrfs testdellnode1: -rw-r--r-- 1 root root 20360 Aug 25 17:43 /var/mmfs/gen/mmsdrfs testgateway: -rw-r--r--. 1 root root 20360 Aug 25 17:43 /var/mmfs/gen/mmsdrfs testsched: -rw-r--r--. 1 root root 20360 Aug 25 17:43 /var/mmfs/gen/mmsdrfs /var/mmfs/gen root at testnsd2# mmdsh -F /tmp/cluster.hostnames "md5sum /var/mmfs/gen/mmsdrfs" testnsd1: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs testnsd3: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs testnsd2: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs testdellnode1: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs testgateway: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs testsched: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs /var/mmfs/gen root at testnsd2# mmdsh -F /tmp/cluster.hostnames "md5sum /var/mmfs/ssl/stage/genkeyData1" testnsd3: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 testnsd1: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 testnsd2: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 testdellnode1: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 testgateway: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 testsched: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 /var/mmfs/gen root at testnsd2# On Sep 20, 2017, at 10:48 AM, Edward Wahl > wrote: I've run into this before. We didn't use to use CCR. And restoring nodes for us is a major pain in the rear as we only allow one-way root SSH, so we have a number of useful little scripts to work around problems like this. Assuming that you have all the necessary files copied to the correct places, you can manually kick off CCR. I think my script does something like: (copy the encryption key info) scp /var/mmfs/ccr/ccr.nodes :/var/mmfs/ccr/ scp /var/mmfs/gen/mmsdrfs :/var/mmfs/gen/ scp /var/mmfs/ssl/stage/genkeyData1 :/var/mmfs/ssl/stage/ :/usr/lpp/mmfs/bin/mmcommon startCcrMonitor you should then see like 2 copies of it running under mmksh. Ed On Wed, 20 Sep 2017 13:55:28 +0000 "Buterbaugh, Kevin L" > wrote: Hi All, testnsd1 and testnsd3 both had hardware issues (power supply and internal HD respectively). Given that they were 12 year old boxes, we decided to replace them with other boxes that are a mere 7 years old ? keep in mind that this is a test cluster. Disabling CCR does not work, even with the undocumented ??force? option: /var/mmfs/gen root at testnsd2# mmchcluster --ccr-disable -p testnsd2 -s testnsd1 --force mmchcluster: Unable to obtain the GPFS configuration file lock. mmchcluster: GPFS was unable to obtain a lock from node testnsd1.vampire. mmchcluster: Processing continues without lock protection. The authenticity of host 'testnsd3.vampire (10.0.6.215)' can't be established. ECDSA key fingerprint is SHA256:Ky1pkjsC/kvt4RA8PJuEh/W3vcxCJZplr2m1XHr+UwI. ECDSA key fingerprint is MD5:55:59:a0:2a:6e:a1:00:58:85:3d:ac:86:0e:cd:2a:8a. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'testnsd1.vampire (10.0.6.213)' can't be established. ECDSA key fingerprint is SHA256:WPiTtyuyzhuv+lRRpgDjLuHpyHyk/W3+c5N9SabWvnE. ECDSA key fingerprint is MD5:26:26:2a:bf:e4:cb:1d:a8:27:35:96:ef:b5:96:e0:29. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp609.vampire (10.0.21.9)' can't be established. ECDSA key fingerprint is SHA256:/gX6eSp/shsRboVFcUFcNCtGSfbBIWQZ/CWjA6gb17Q. ECDSA key fingerprint is MD5:ca:4d:58:8c:91:28:25:7b:5b:b1:0d:a3:72:a3:00:bb. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp608.vampire (10.0.21.8)' can't be established. ECDSA key fingerprint is SHA256:tvtNWN9b7/Qknb/Am8x7FzyMngi6R3f5SHBqATNtLzw. ECDSA key fingerprint is MD5:fc:4e:87:fb:09:82:cd:67:b0:7d:7f:c7:4b:83:b9:6c. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp612.vampire (10.0.21.12)' can't be established. ECDSA key fingerprint is SHA256:zKXqPt8rIMZWSAYavKEuaAVIm31OGVovoWVU+dBTRPM. ECDSA key fingerprint is MD5:72:4d:fb:22:4e:b3:0e:04:37:be:16:74:ae:ea:05:6c. Are you sure you want to continue connecting (yes/no)? root at vmp610.vampire's password: testnsd3.vampire: Host key verification failed. mmdsh: testnsd3.vampire remote shell process had return code 255. testnsd1.vampire: Host key verification failed. mmdsh: testnsd1.vampire remote shell process had return code 255. vmp609.vampire: Host key verification failed. mmdsh: vmp609.vampire remote shell process had return code 255. vmp608.vampire: Host key verification failed. mmdsh: vmp608.vampire remote shell process had return code 255. vmp612.vampire: Host key verification failed. mmdsh: vmp612.vampire remote shell process had return code 255. root at vmp610.vampire's password: vmp610.vampire: Permission denied, please try again. root at vmp610.vampire's password: vmp610.vampire: Permission denied, please try again. vmp610.vampire: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password). mmdsh: vmp610.vampire remote shell process had return code 255. Verifying GPFS is stopped on all nodes ... The authenticity of host 'testnsd3.vampire (10.0.6.215)' can't be established. ECDSA key fingerprint is SHA256:Ky1pkjsC/kvt4RA8PJuEh/W3vcxCJZplr2m1XHr+UwI. ECDSA key fingerprint is MD5:55:59:a0:2a:6e:a1:00:58:85:3d:ac:86:0e:cd:2a:8a. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp612.vampire (10.0.21.12)' can't be established. ECDSA key fingerprint is SHA256:zKXqPt8rIMZWSAYavKEuaAVIm31OGVovoWVU+dBTRPM. ECDSA key fingerprint is MD5:72:4d:fb:22:4e:b3:0e:04:37:be:16:74:ae:ea:05:6c. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp608.vampire (10.0.21.8)' can't be established. ECDSA key fingerprint is SHA256:tvtNWN9b7/Qknb/Am8x7FzyMngi6R3f5SHBqATNtLzw. ECDSA key fingerprint is MD5:fc:4e:87:fb:09:82:cd:67:b0:7d:7f:c7:4b:83:b9:6c. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp609.vampire (10.0.21.9)' can't be established. ECDSA key fingerprint is SHA256:/gX6eSp/shsRboVFcUFcNCtGSfbBIWQZ/CWjA6gb17Q. ECDSA key fingerprint is MD5:ca:4d:58:8c:91:28:25:7b:5b:b1:0d:a3:72:a3:00:bb. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'testnsd1.vampire (10.0.6.213)' can't be established. ECDSA key fingerprint is SHA256:WPiTtyuyzhuv+lRRpgDjLuHpyHyk/W3+c5N9SabWvnE. ECDSA key fingerprint is MD5:26:26:2a:bf:e4:cb:1d:a8:27:35:96:ef:b5:96:e0:29. Are you sure you want to continue connecting (yes/no)? root at vmp610.vampire's password: root at vmp610.vampire's password: root at vmp610.vampire's password: testnsd3.vampire: Host key verification failed. mmdsh: testnsd3.vampire remote shell process had return code 255. vmp612.vampire: Host key verification failed. mmdsh: vmp612.vampire remote shell process had return code 255. vmp608.vampire: Host key verification failed. mmdsh: vmp608.vampire remote shell process had return code 255. vmp609.vampire: Host key verification failed. mmdsh: vmp609.vampire remote shell process had return code 255. testnsd1.vampire: Host key verification failed. mmdsh: testnsd1.vampire remote shell process had return code 255. vmp610.vampire: Permission denied, please try again. vmp610.vampire: Permission denied, please try again. vmp610.vampire: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password). mmdsh: vmp610.vampire remote shell process had return code 255. mmchcluster: Command failed. Examine previous error messages to determine cause. /var/mmfs/gen root at testnsd2# I believe that part of the problem may be that there are 4 client nodes that were removed from the cluster without removing them from the cluster (done by another SysAdmin who was in a hurry to repurpose those machines). They?re up and pingable but not reachable by GPFS anymore, which I?m pretty sure is making things worse. Nor does Loic?s suggestion of running mmcommon work (but thanks for the suggestion!) ? actually the mmcommon part worked, but a subsequent attempt to start the cluster up failed: /var/mmfs/gen root at testnsd2# mmstartup -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous error messages to determine cause. /var/mmfs/gen root at testnsd2# Thanks. Kevin On Sep 19, 2017, at 10:07 PM, IBM Spectrum Scale > wrote: Hi Kevin, Let's me try to understand the problem you have. What's the meaning of node died here. Are you mean that there are some hardware/OS issue which cannot be fixed and OS cannot be up anymore? I agree with Bob that you can have a try to disable CCR temporally, restore cluster configuration and enable it again. Such as: 1. Login to a node which has proper GPFS config, e.g NodeA 2. Shutdown daemon in all client cluster. 3. mmchcluster --ccr-disable -p NodeA 4. mmsdrrestore -a -p NodeA 5. mmauth genkey propagate -N testnsd1, testnsd3 6. mmchcluster --ccr-enable Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fdeveloperworks%2Fcommunity%2Fforums%2Fhtml%2Fforum%3Fid%3D11111111-0000-0000-0000-000000000479&data=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C745cfeaac7264124bb8c08d5003f162a%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636415193316350738&sdata=8OL9COHsb4M%2BZOyWta92acdO8K1Ez8HJfHbrCdDsmRs%3D&reserved=0. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. "Oesterlin, Robert" ---09/20/2017 07:39:55 AM---OK ? I?ve run across this before, and it?s because of a bug (as I recall) having to do with CCR and From: "Oesterlin, Robert" > To: gpfsug main discussion list > Date: 09/20/2017 07:39 AM Subject: Re: [gpfsug-discuss] CCR cluster down for the count? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ OK ? I?ve run across this before, and it?s because of a bug (as I recall) having to do with CCR and quorum. What I think you can do is set the cluster to non-ccr (mmchcluster ?ccr-disable) with all the nodes down, bring it back up and then re-enable ccr. I?ll see if I can find this in one of the recent 4.2 release nodes. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: > on behalf of "Buterbaugh, Kevin L" > Reply-To: gpfsug main discussion list > Date: Tuesday, September 19, 2017 at 4:03 PM To: gpfsug main discussion list > Subject: [EXTERNAL] [gpfsug-discuss] CCR cluster down for the count? Hi All, We have a small test cluster that is CCR enabled. It only had/has 3 NSD servers (testnsd1, 2, and 3) and maybe 3-6 clients. testnsd3 died a while back. I did nothing about it at the time because it was due to be life-cycled as soon as I finished a couple of higher priority projects. Yesterday, testnsd1 also died, which took the whole cluster down. So now resolving this has become higher priority? ;-) I took two other boxes and set them up as testnsd1 and 3, respectively. I?ve done a ?mmsdrrestore -p testnsd2 -R /usr/bin/scp? on both of them. I?ve also done a "mmccr setup -F? and copied the ccr.disks and ccr.nodes files from testnsd2 to them. And I?ve copied /var/mmfs/gen/mmsdrfs from testnsd2 to testnsd1 and 3. In case it?s not obvious from the above, networking is fine ? ssh without a password between those 3 boxes is fine. However, when I try to startup GPFS ? or run any GPFS command I get: /root root at testnsd2# mmstartup -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous error messages to determine cause. /root root at testnsd2# I?ve got to run to a meeting right now, so I hope I?m not leaving out any crucial details here ? does anyone have an idea what I need to do? Thanks? ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttp-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss%26d%3DDwICAg%26c%3Djf_iaSHvJObTbx-siA1ZOg%26r%3DIbxtjdkPAM2Sbon4Lbbi4w%26m%3DmBSa534LB4C2zN59ZsJSlginQqfcrutinpAPYNDqU_Y%26s%3DYJEapknqzE2d9kwZzZuu6gEW0DzBoM-o94pXGEeCfuI%26e&data=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C745cfeaac7264124bb8c08d5003f162a%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636415193316350738&sdata=oQ4u%2BdyyYLY7HzaOqRPEGjUVhi7AQF%2BvbvnWA4bhuXE%3D&reserved=0= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C494f0469ec084568b39608d4ffd4b8c2%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636414736486816768&sdata=kBvEL7Kp2JMGuLIL4NX3UV7h3emaayQSbHr8O1F2CXc%3D&reserved=0 -- Ed Wahl Ohio Supercomputer Center 614-292-9302 ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -- Ed Wahl Ohio Supercomputer Center 614-292-9302 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cfabfdb4659d249e2d20308d5005ae1ab%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636415312700069585&sdata=Z59ik0w%2BaK6bV2JsDxSNt%2FsqwR1ESuqkXTQVBlRjDgw%3D&reserved=0 ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Thu Sep 21 19:49:29 2017 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Thu, 21 Sep 2017 11:49:29 -0700 Subject: [gpfsug-discuss] User Meeting & SPXXL in NYC In-Reply-To: References: Message-ID: Registration space is getting tight. We decided on a room reconfiguration today to make a little more room. So if you tried to register and were told it was full try again. If it fills up again and you want to register, but can?t drop me an email and I?ll see what we can do. Best, Kristy > On Sep 20, 2017, at 9:00 AM, Kristy Kallback-Rose wrote: > > Thanks Doug. > > If you plan to go, *do register*. GPFS Day is free, but we need to know how many will attend. Register using the link on the HPCXXL event page below. > > Cheers, > Kristy > >> On Sep 20, 2017, at 1:28 AM, Douglas O'flaherty > wrote: >> >> >> Reminder that the SPXXL day on IBM Spectrum Scale in New York is open to all. It is Thursday the 28th. There is also a Power day on Wednesday. >> >> >> For more information >> http://hpcxxl.org/summer-2017-meeting-september-24-29-new-york-city/ >> >> Doug >> >> Mobile >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From christof.schmitt at us.ibm.com Fri Sep 22 23:08:58 2017 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Fri, 22 Sep 2017 22:08:58 +0000 Subject: [gpfsug-discuss] Strange timestamp behaviour on NFS via CES In-Reply-To: <10541a8ed07149ecafdbe9ac03b807b8@maxiv.lu.se> References: <10541a8ed07149ecafdbe9ac03b807b8@maxiv.lu.se>, , Message-ID: An HTML attachment was scrubbed... URL: From christof.schmitt at us.ibm.com Fri Sep 22 23:10:45 2017 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Fri, 22 Sep 2017 22:10:45 +0000 Subject: [gpfsug-discuss] Strange timestamp behaviour on NFS via CES In-Reply-To: References: , <10541a8ed07149ecafdbe9ac03b807b8@maxiv.lu.se>, , Message-ID: An HTML attachment was scrubbed... URL: From bipcuds at gmail.com Sun Sep 24 19:04:59 2017 From: bipcuds at gmail.com (Keith Ball) Date: Sun, 24 Sep 2017 14:04:59 -0400 Subject: [gpfsug-discuss] Experience with zimon database stability, and best practices for backup? Message-ID: Hello All, In a recent Spectrum Scale performance study, we used zimon/mmperfmon to gather metrics. During a period of 2 months, we ended up losing data twice from the zimon database; once after the virtual disk serving both the OS files and zimon collector and DB storage was resized, and a second time after an unknown event (the loss was discovered when plotting in Grafana only went back to a certain data and time; likewise, mmperfmon query output only went back to the same time). Details: - Spectrum Scale 4.2.1.1 (on NSD servers); 4.2.1.2 on the zimon collector node and other clients - Data retention in the "raw" stratum was set to 2 months; the "domains" settings were as follows (note that we did not hit the ceiling of 60GB (1GB/file * 60 files): domains = { # this is the raw domain aggregation = 0 # aggregation factor for the raw domain is always 0. ram = "12g" # amount of RAM to be used duration = "2m" # amount of time that data with the highest precision is kept. filesize = "1g" # maximum file size files = 60 # number of files. }, { # this is the first aggregation domain that aggregates to 10 seconds aggregation = 10 ram = "800m" # amount of RAM to be used duration = "6m" # keep aggregates for 1 week. filesize = "1g" # maximum file size files = 10 # number of files. }, { # this is the second aggregation domain that aggregates to 30*10 seconds == 5 minutes aggregation = 30 ram = "800m" # amount of RAM to be used duration = "1y" # keep averages for 2 months. filesize = "1g" # maximum file size files = 5 # number of files. }, { # this is the third aggregation domain that aggregates to 24*30*10 seconds == 2 hours aggregation = 24 ram = "800m" # amount of RAM to be used duration = "2y" # filesize = "1g" # maximum file size files = 5 # number of files. } Questions: 1.) Has anyone had similar issues with losing data from zimon? 2.) Are there known circumstances where data could be lost, e.g. changing the aggregation domain definitions, or even simply restarting the zimon collector? 3.) Does anyone have any "best practices" for backing up the zimon database? We were taking weekly "snapshots" by shutting down the collector, and making a tarball copy of the /opt/ibm/zimon directory (but the database corruption/data loss still crept through for various reasons). In terms of debugging, we do not have Scale or zimon logs going back to the suspected dates of data loss; we do have a gpfs.snap from about a month after the last data loss - would it have any useful clues? Opening a PMR could be tricky, as it was the customer who has the support entitlement, and the environment (specifically the old cluster definitino and the zimon collector VM) was torn down. Many Thanks, Keith -- Keith D. Ball, PhD RedLine Performance Solutions, LLC web: http://www.redlineperf.com/ email: kball at redlineperf.com cell: 540-557-7851 <%28540%29%20557-7851> -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Sun Sep 24 20:29:10 2017 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Sun, 24 Sep 2017 12:29:10 -0700 Subject: [gpfsug-discuss] Experience with zimon database stability, and best practices for backup? In-Reply-To: References: Message-ID: Hi Keith, We have barely begun with Zimon and have not (knock, knock) run up against any loss or corruption issues with Zimon. However, getting data out of Zimon for various reasons is something I have been thinking about. I'm interested partly because of the granularity that is lost over time like with any round robin style data collection scheme. So I guess one question is whether you have considered pulling the data out to another database, looked at the SS GUI which uses a postgres db (iirc, about to take off on a flight and can't check), or looked at the Grafana bridge which would get data into OpenTsdb format, again iirc. Anyway, just some things for consideration and a request to share back whatever you find out if it's off list. Thanks, getting stink eye to go to airplane mode. More later. Cheers Kristy On Sep 24, 2017 11:05 AM, "Keith Ball" wrote: Hello All, In a recent Spectrum Scale performance study, we used zimon/mmperfmon to gather metrics. During a period of 2 months, we ended up losing data twice from the zimon database; once after the virtual disk serving both the OS files and zimon collector and DB storage was resized, and a second time after an unknown event (the loss was discovered when plotting in Grafana only went back to a certain data and time; likewise, mmperfmon query output only went back to the same time). Details: - Spectrum Scale 4.2.1.1 (on NSD servers); 4.2.1.2 on the zimon collector node and other clients - Data retention in the "raw" stratum was set to 2 months; the "domains" settings were as follows (note that we did not hit the ceiling of 60GB (1GB/file * 60 files): domains = { # this is the raw domain aggregation = 0 # aggregation factor for the raw domain is always 0. ram = "12g" # amount of RAM to be used duration = "2m" # amount of time that data with the highest precision is kept. filesize = "1g" # maximum file size files = 60 # number of files. }, { # this is the first aggregation domain that aggregates to 10 seconds aggregation = 10 ram = "800m" # amount of RAM to be used duration = "6m" # keep aggregates for 1 week. filesize = "1g" # maximum file size files = 10 # number of files. }, { # this is the second aggregation domain that aggregates to 30*10 seconds == 5 minutes aggregation = 30 ram = "800m" # amount of RAM to be used duration = "1y" # keep averages for 2 months. filesize = "1g" # maximum file size files = 5 # number of files. }, { # this is the third aggregation domain that aggregates to 24*30*10 seconds == 2 hours aggregation = 24 ram = "800m" # amount of RAM to be used duration = "2y" # filesize = "1g" # maximum file size files = 5 # number of files. } Questions: 1.) Has anyone had similar issues with losing data from zimon? 2.) Are there known circumstances where data could be lost, e.g. changing the aggregation domain definitions, or even simply restarting the zimon collector? 3.) Does anyone have any "best practices" for backing up the zimon database? We were taking weekly "snapshots" by shutting down the collector, and making a tarball copy of the /opt/ibm/zimon directory (but the database corruption/data loss still crept through for various reasons). In terms of debugging, we do not have Scale or zimon logs going back to the suspected dates of data loss; we do have a gpfs.snap from about a month after the last data loss - would it have any useful clues? Opening a PMR could be tricky, as it was the customer who has the support entitlement, and the environment (specifically the old cluster definitino and the zimon collector VM) was torn down. Many Thanks, Keith -- Keith D. Ball, PhD RedLine Performance Solutions, LLC web: http://www.redlineperf.com/ email: kball at redlineperf.com cell: 540-557-7851 <%28540%29%20557-7851> _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From rkomandu at in.ibm.com Mon Sep 25 06:26:15 2017 From: rkomandu at in.ibm.com (Ravi K Komanduri) Date: Mon, 25 Sep 2017 10:56:15 +0530 Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication In-Reply-To: References: Message-ID: Jonathon, This requires SMB service when you are at 422 PTF2. As Mike pointed out if you upgrade to the 4.2.3-3/4 build you will no longer hit that issue With Regards, Ravi K Komanduri Email:rkomandu at in.ibm.com From: "Michael L Taylor" To: gpfsug-discuss at spectrumscale.org Date: 09/21/2017 08:03 PM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Jonathon, We were able to run this scenario successfully in our lab at the latest released 4.2.3.4. # /usr/lpp/mmfs/bin/mmdiag --version === mmdiag: version === Current GPFS build: "4.2.3.4 ". # /usr/lpp/mmfs/bin/mmces service list -a Enabled services: NFS node1.test.ibm.com: NFS is running # /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type userdefined File authentication configuration completed successfully. # rpm -qa | grep gpfs gpfs.ext-4.2.3-4.x86_64 gpfs.docs-4.2.3-4.noarch gpfs.gskit-8.0.50-75.x86_64 gpfs.gpl-4.2.3-4.noarch gpfs.msg.en_US-4.2.3-4.noarch nfs-ganesha-gpfs-2.3.2-0.ibm47.el7.x86_64 gpfs.base-4.2.3-4.x86_64 # rpm -qa | grep nfs-gan nfs-ganesha-utils-2.3.2-0.ibm47.el7.x86_64 nfs-ganesha-2.3.2-0.ibm47.el7.x86_64 nfs-ganesha-gpfs-2.3.2-0.ibm47.el7.x86_64 From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 09/20/2017 12:07 PM Subject: gpfsug-discuss Digest, Vol 68, Issue 42 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=BpVUgvFT2Qwgw0hveEgQaHFwn2mjeQjeBrkXHX_aC0A&m=2oGcWc1xx6zOclryoU2BdJykABuIR118zXTmSAA8msU&s=7q0JMYVHMSGlUAYquNMlrDRF6BDj6-76Oc4VbXrvlHE&e= or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: export nfs share on gpfs with no authentication (Jonathon A Anderson) ---------------------------------------------------------------------- Message: 1 Date: Wed, 20 Sep 2017 18:55:04 +0000 From: Jonathon A Anderson To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Message-ID: Content-Type: text/plain; charset="us-ascii" I shouldn't need SMB for authentication if I'm only using userdefined authentication, though. ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Sobey, Richard A Sent: Wednesday, September 20, 2017 2:23:37 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication This sounded familiar to a problem I had to do with SMB and NFS. I've looked, and it's a different problem, but at the time I had this response. "That would be the case when Active Directory is configured for authentication. In that case the SMB service includes two aspects: One is the actual SMB file server, and the second one is the service for the Active Directory integration. Since NFS depends on authentication and id mapping services, it requires SMB to be running." I suspect the last paragraph is relevant in your case. HTH Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jonathon A Anderson Sent: 20 September 2017 06:13 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Returning to this thread cause I'm having the same issue as Ilan, above. I'm working on setting up CES in our environment after finally getting a blocking bugfix applied. I'm making it further now, but I'm getting an error when I try to create my export: --- [root at sgate2 ~]# mmnfs export add /gpfs/summit/scratch --client 'login*.rc.int.colorado.edu(rw,root_squash);dtn*.rc.int.colorado.edu(rw,root_squash)' mmcesfuncs.sh: Current authentication: none is invalid. This operation can not be completed without correct Authentication configuration. Configure authentication using: mmuserauth mmnfs export add: Command failed. Examine previous error messages to determine cause. --- When I try to configure mmuserauth, I get an error about not having SMB active; but I don't want to configure SMB, only NFS. --- [root at sgate2 ~]# /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. --- How can I configure NFS exports with mmnfs without having to enable SMB? ~jonathon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=ilYETqcaNr1y1ulWWDPjVg_X9pt35O1eYBTyFwJP56Y&m=VW8gJLSqT4rru6lFZXxCFp-Y3ngi6IUydv5czoG8kTE&s=deIQZQr-qfqLqW377yNysTJI8y7QJOdbokVjlnDr2d8&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.hearns at asml.com Mon Sep 25 08:40:34 2017 From: john.hearns at asml.com (John Hearns) Date: Mon, 25 Sep 2017 07:40:34 +0000 Subject: [gpfsug-discuss] SPectrum Scale on AWS Message-ID: I guess this is not news on this list, however I did see a reference to SpectrumScale on The Register this morning, which linked to this paper: https://s3.amazonaws.com/quickstart-reference/ibm/spectrum/scale/latest/doc/ibm-spectrum-scale-on-the-aws-cloud.pdf The article is here https://www.theregister.co.uk/2017/09/25/storage_super_club_sandwich/ 12 Terabyte Helium drives now available. -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikeowen at thinkboxsoftware.com Mon Sep 25 10:26:21 2017 From: mikeowen at thinkboxsoftware.com (Mike Owen) Date: Mon, 25 Sep 2017 10:26:21 +0100 Subject: [gpfsug-discuss] SPectrum Scale on AWS In-Reply-To: References: Message-ID: Full PR release below: https://aws.amazon.com/about-aws/whats-new/2017/09/deploy-ibm-spectrum-scale-on-the-aws-cloud-with-new-quick-start/ Posted On: Sep 13, 2017 This new Quick Start automatically deploys a highly available IBM Spectrum Scale cluster with replication on the Amazon Web Services (AWS) Cloud, into a configuration of your choice. (A small cluster can be deployed in about 25 minutes.) IBM Spectrum Scale is a flexible, software-defined storage solution that can be deployed as highly available, high-performance file storage. It can scale in several dimensions, including performance (bandwidth and IOPS), capacity, and number of nodes that can mount the file system. The product?s high performance and scalability helps address the needs of applications whose performance (or performance-to-capacity ratio) demands cannot be met by traditional scale-up storage systems. The IBM Spectrum Scale software is being made available through a 90-day trial license evaluation program. This Quick Start automates the deployment of IBM Spectrum Scale on AWS for users who require highly available access to a shared name space across multiple instances with good performance, without requiring an in-depth knowledge of IBM Spectrum Scale. The Quick Start deploys IBM Network Shared Disk (NSD) storage server instances and IBM Spectrum Scale compute instances into a virtual private cloud (VPC) in your AWS account. Data and metadata elements are replicated across two Availability Zones for optimal data protection. You can build a new VPC for IBM Spectrum Scale, or deploy the software into your existing VPC. The automated deployment provisions the IBM Spectrum Scale instances in Auto Scaling groups for instance scaling and management. The deployment and configuration tasks are automated by AWS CloudFormation templates that you can customize during launch. You can also use the templates as a starting point for your own implementation, by downloading them from the GitHub repository . The Quick Start includes a guide with step-by-step deployment and configuration instructions. To get started with IBM Spectrum Scale on AWS, use the following resources: - View the architecture and details - View the deployment guide - Browse and launch other AWS Quick Start reference deployments On 25 September 2017 at 08:40, John Hearns wrote: > I guess this is not news on this list, however I did see a reference to > SpectrumScale on The Register this morning, > > which linked to this paper: > > https://s3.amazonaws.com/quickstart-reference/ibm/ > spectrum/scale/latest/doc/ibm-spectrum-scale-on-the-aws-cloud.pdf > > > > The article is here https://www.theregister.co.uk/ > 2017/09/25/storage_super_club_sandwich/ > > 12 Terabyte Helium drives now available. > > > > > -- The information contained in this communication and any attachments is > confidential and may be privileged, and is for the sole use of the intended > recipient(s). Any unauthorized review, use, disclosure or distribution is > prohibited. Unless explicitly stated otherwise in the body of this > communication or the attachment thereto (if any), the information is > provided on an AS-IS basis without any express or implied warranties or > liabilities. To the extent you are relying on this information, you are > doing so at your own risk. If you are not the intended recipient, please > notify the sender immediately by replying to this message and destroy all > copies of this message and any attachments. Neither the sender nor the > company/group of companies he or she represents shall be liable for the > proper and complete transmission of the information contained in this > communication, or for any delay in its receipt. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Mon Sep 25 12:42:15 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 25 Sep 2017 11:42:15 +0000 Subject: [gpfsug-discuss] Experience with zimon database stability, and best practices for backup? Message-ID: <018DE6B7-ADE3-4A01-B23C-9DB668FD95DB@nuance.com> Another data point for Keith/Kristy, I?ve been using Zimon for about 18 months now, and I?ll have to admit it?s been less than robust for long-term data. The biggest issue I?ve run into is the stability of the collector process. I have it crash on a fairly regular basis, most due to memory usage. This results in data loss You can configure it in a highly-available mode that should mitigate this to some degree. However, I don?t think IBM has published any details on how reliable the data collection process is. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Kristy Kallback-Rose Reply-To: gpfsug main discussion list Date: Sunday, September 24, 2017 at 2:29 PM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Experience with zimon database stability, and best practices for backup? Hi Keith, We have barely begun with Zimon and have not (knock, knock) run up against any loss or corruption issues with Zimon. However, getting data out of Zimon for various reasons is something I have been thinking about. I'm interested partly because of the granularity that is lost over time like with any round robin style data collection scheme. So I guess one question is whether you have considered pulling the data out to another database, looked at the SS GUI which uses a postgres db (iirc, about to take off on a flight and can't check), or looked at the Grafana bridge which would get data into OpenTsdb format, again iirc. Anyway, just some things for consideration and a request to share back whatever you find out if it's off list. Thanks, getting stink eye to go to airplane mode. More later. Cheers Kristy On Sep 24, 2017 11:05 AM, "Keith Ball" > wrote: Hello All, In a recent Spectrum Scale performance study, we used zimon/mmperfmon to gather metrics. During a period of 2 months, we ended up losing data twice from the zimon database; once after the virtual disk serving both the OS files and zimon collector and DB storage was resized, and a second time after an unknown event (the loss was discovered when plotting in Grafana only went back to a certain data and time; likewise, mmperfmon query output only went back to the same time). Details: - Spectrum Scale 4.2.1.1 (on NSD servers); 4.2.1.2 on the zimon collector node and other clients - Data retention in the "raw" stratum was set to 2 months; the "domains" settings were as follows (note that we did not hit the ceiling of 60GB (1GB/file * 60 files): domains = { # this is the raw domain aggregation = 0 # aggregation factor for the raw domain is always 0. ram = "12g" # amount of RAM to be used duration = "2m" # amount of time that data with the highest precision is kept. filesize = "1g" # maximum file size files = 60 # number of files. }, { # this is the first aggregation domain that aggregates to 10 seconds aggregation = 10 ram = "800m" # amount of RAM to be used duration = "6m" # keep aggregates for 1 week. filesize = "1g" # maximum file size files = 10 # number of files. }, { # this is the second aggregation domain that aggregates to 30*10 seconds == 5 minutes aggregation = 30 ram = "800m" # amount of RAM to be used duration = "1y" # keep averages for 2 months. filesize = "1g" # maximum file size files = 5 # number of files. }, { # this is the third aggregation domain that aggregates to 24*30*10 seconds == 2 hours aggregation = 24 ram = "800m" # amount of RAM to be used duration = "2y" # filesize = "1g" # maximum file size files = 5 # number of files. } Questions: 1.) Has anyone had similar issues with losing data from zimon? 2.) Are there known circumstances where data could be lost, e.g. changing the aggregation domain definitions, or even simply restarting the zimon collector? 3.) Does anyone have any "best practices" for backing up the zimon database? We were taking weekly "snapshots" by shutting down the collector, and making a tarball copy of the /opt/ibm/zimon directory (but the database corruption/data loss still crept through for various reasons). In terms of debugging, we do not have Scale or zimon logs going back to the suspected dates of data loss; we do have a gpfs.snap from about a month after the last data loss - would it have any useful clues? Opening a PMR could be tricky, as it was the customer who has the support entitlement, and the environment (specifically the old cluster definitino and the zimon collector VM) was torn down. Many Thanks, Keith -- Keith D. Ball, PhD RedLine Performance Solutions, LLC web: http://www.redlineperf.com/ email: kball at redlineperf.com cell: 540-557-7851 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Mon Sep 25 15:35:33 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Mon, 25 Sep 2017 14:35:33 +0000 Subject: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? Message-ID: <1506350132.352.17.camel@imperial.ac.uk> Hi all, This used to work (removing a SID from an ACL), but doesn't any more. Looks like a bug unless I'm being stupid. [root at cesnode~]# mmsmb exportacl list neuroscience2 --viewsids [neuroscience2] REVISION:1 ACL:S-1-1-0:ALLOWED/FULL [root at cesnode ~] mmsmb exportacl remove neuroscience2 --SID S-1-1-0 mmsmb exportacl remove: Incorrect option: --sid Usage: mmsmb exportacl remove ExportName {Name | --user UserName | --group GroupName | --system SystemName | --SID SID} [--access Access] [--permissions Permissions] [--viewsddl] [--viewsids] [-h|--help] where: Access is one of ALLOWED, DENIED Permissions is one of FULL, CHANGE, READ or any combination of RWXDPO I've tried lower case SID i.e --sid, and specifying --access ALLOWED and --permissions FULL. Omitting the --SID argument entirely simply results in GPFS telling me I must specify a Name or an SID. [root at cesnode~]# mmsmb exportacl remove neuroscience2 --access ALLOWED --permissions FULL [E] The mmsmb exportacl remove command requires a Name or SID. Can anyone see my mistake? Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From christof.schmitt at us.ibm.com Mon Sep 25 22:41:11 2017 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Mon, 25 Sep 2017 21:41:11 +0000 Subject: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? In-Reply-To: <1506350132.352.17.camel@imperial.ac.uk> References: <1506350132.352.17.camel@imperial.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: From christof.schmitt at us.ibm.com Mon Sep 25 22:41:11 2017 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Mon, 25 Sep 2017 21:41:11 +0000 Subject: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? In-Reply-To: <1506350132.352.17.camel@imperial.ac.uk> References: <1506350132.352.17.camel@imperial.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Tue Sep 26 09:22:05 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Tue, 26 Sep 2017 08:22:05 +0000 Subject: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? In-Reply-To: References: <1506350132.352.17.camel@imperial.ac.uk> Message-ID: Hi Christof, thanks I?ll try it on a test cluster. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christof Schmitt Sent: 25 September 2017 22:41 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? 4.2.3 PTF4 seems to have a fix for this area. Can you try again with that PTF installed? Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) ----- Original message ----- From: "Sobey, Richard A" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at gpfsug.org" > Cc: Subject: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? Date: Mon, Sep 25, 2017 7:35 AM Hi all, This used to work (removing a SID from an ACL), but doesn't any more. Looks like a bug unless I'm being stupid. [root at cesnode~]# mmsmb exportacl list neuroscience2 --viewsids [neuroscience2] REVISION:1 ACL:S-1-1-0:ALLOWED/FULL [root at cesnode ~] mmsmb exportacl remove neuroscience2 --SID S-1-1-0 mmsmb exportacl remove: Incorrect option: --sid Usage: mmsmb exportacl remove ExportName {Name | --user UserName | --group GroupName | --system SystemName | --SID SID} [--access Access] [--permissions Permissions] [--viewsddl] [--viewsids] [-h|--help] where: Access is one of ALLOWED, DENIED Permissions is one of FULL, CHANGE, READ or any combination of RWXDPO I've tried lower case SID i.e --sid, and specifying --access ALLOWED and --permissions FULL. Omitting the --SID argument entirely simply results in GPFS telling me I must specify a Name or an SID. [root at cesnode~]# mmsmb exportacl remove neuroscience2 --access ALLOWED --permissions FULL [E] The mmsmb exportacl remove command requires a Name or SID. Can anyone see my mistake? Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=5Nn7eUPeYe291x8f39jKybESLKv_W_XtkTkS8fTR-NI&m=c3uTUNFbPTWWcTNUNVVejlQ0xdnhBAfQdouTBlVgnjc&s=hsWeRhH-BhTEaAlrTPbJGlwCV-5Ui7t03Zcec9kywOA&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Tue Sep 26 09:22:05 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Tue, 26 Sep 2017 08:22:05 +0000 Subject: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? In-Reply-To: References: <1506350132.352.17.camel@imperial.ac.uk> Message-ID: Hi Christof, thanks I?ll try it on a test cluster. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christof Schmitt Sent: 25 September 2017 22:41 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? 4.2.3 PTF4 seems to have a fix for this area. Can you try again with that PTF installed? Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) ----- Original message ----- From: "Sobey, Richard A" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at gpfsug.org" > Cc: Subject: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? Date: Mon, Sep 25, 2017 7:35 AM Hi all, This used to work (removing a SID from an ACL), but doesn't any more. Looks like a bug unless I'm being stupid. [root at cesnode~]# mmsmb exportacl list neuroscience2 --viewsids [neuroscience2] REVISION:1 ACL:S-1-1-0:ALLOWED/FULL [root at cesnode ~] mmsmb exportacl remove neuroscience2 --SID S-1-1-0 mmsmb exportacl remove: Incorrect option: --sid Usage: mmsmb exportacl remove ExportName {Name | --user UserName | --group GroupName | --system SystemName | --SID SID} [--access Access] [--permissions Permissions] [--viewsddl] [--viewsids] [-h|--help] where: Access is one of ALLOWED, DENIED Permissions is one of FULL, CHANGE, READ or any combination of RWXDPO I've tried lower case SID i.e --sid, and specifying --access ALLOWED and --permissions FULL. Omitting the --SID argument entirely simply results in GPFS telling me I must specify a Name or an SID. [root at cesnode~]# mmsmb exportacl remove neuroscience2 --access ALLOWED --permissions FULL [E] The mmsmb exportacl remove command requires a Name or SID. Can anyone see my mistake? Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=5Nn7eUPeYe291x8f39jKybESLKv_W_XtkTkS8fTR-NI&m=c3uTUNFbPTWWcTNUNVVejlQ0xdnhBAfQdouTBlVgnjc&s=hsWeRhH-BhTEaAlrTPbJGlwCV-5Ui7t03Zcec9kywOA&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Tue Sep 26 10:59:13 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Tue, 26 Sep 2017 09:59:13 +0000 Subject: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? In-Reply-To: References: <1506350132.352.17.camel@imperial.ac.uk> Message-ID: There isn?t a default ACL being applied to the export at all now, which is fine, but it differs from the behaviour in 4.2.3 PTF2. Thanks Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 26 September 2017 09:22 To: gpfsug main discussion list Cc: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? Hi Christof, thanks I?ll try it on a test cluster. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christof Schmitt Sent: 25 September 2017 22:41 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? 4.2.3 PTF4 seems to have a fix for this area. Can you try again with that PTF installed? Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) ----- Original message ----- From: "Sobey, Richard A" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at gpfsug.org" > Cc: Subject: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? Date: Mon, Sep 25, 2017 7:35 AM Hi all, This used to work (removing a SID from an ACL), but doesn't any more. Looks like a bug unless I'm being stupid. [root at cesnode~]# mmsmb exportacl list neuroscience2 --viewsids [neuroscience2] REVISION:1 ACL:S-1-1-0:ALLOWED/FULL [root at cesnode ~] mmsmb exportacl remove neuroscience2 --SID S-1-1-0 mmsmb exportacl remove: Incorrect option: --sid Usage: mmsmb exportacl remove ExportName {Name | --user UserName | --group GroupName | --system SystemName | --SID SID} [--access Access] [--permissions Permissions] [--viewsddl] [--viewsids] [-h|--help] where: Access is one of ALLOWED, DENIED Permissions is one of FULL, CHANGE, READ or any combination of RWXDPO I've tried lower case SID i.e --sid, and specifying --access ALLOWED and --permissions FULL. Omitting the --SID argument entirely simply results in GPFS telling me I must specify a Name or an SID. [root at cesnode~]# mmsmb exportacl remove neuroscience2 --access ALLOWED --permissions FULL [E] The mmsmb exportacl remove command requires a Name or SID. Can anyone see my mistake? Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=5Nn7eUPeYe291x8f39jKybESLKv_W_XtkTkS8fTR-NI&m=c3uTUNFbPTWWcTNUNVVejlQ0xdnhBAfQdouTBlVgnjc&s=hsWeRhH-BhTEaAlrTPbJGlwCV-5Ui7t03Zcec9kywOA&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Tue Sep 26 10:59:13 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Tue, 26 Sep 2017 09:59:13 +0000 Subject: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? In-Reply-To: References: <1506350132.352.17.camel@imperial.ac.uk> Message-ID: There isn?t a default ACL being applied to the export at all now, which is fine, but it differs from the behaviour in 4.2.3 PTF2. Thanks Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 26 September 2017 09:22 To: gpfsug main discussion list Cc: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? Hi Christof, thanks I?ll try it on a test cluster. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christof Schmitt Sent: 25 September 2017 22:41 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? 4.2.3 PTF4 seems to have a fix for this area. Can you try again with that PTF installed? Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) ----- Original message ----- From: "Sobey, Richard A" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at gpfsug.org" > Cc: Subject: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? Date: Mon, Sep 25, 2017 7:35 AM Hi all, This used to work (removing a SID from an ACL), but doesn't any more. Looks like a bug unless I'm being stupid. [root at cesnode~]# mmsmb exportacl list neuroscience2 --viewsids [neuroscience2] REVISION:1 ACL:S-1-1-0:ALLOWED/FULL [root at cesnode ~] mmsmb exportacl remove neuroscience2 --SID S-1-1-0 mmsmb exportacl remove: Incorrect option: --sid Usage: mmsmb exportacl remove ExportName {Name | --user UserName | --group GroupName | --system SystemName | --SID SID} [--access Access] [--permissions Permissions] [--viewsddl] [--viewsids] [-h|--help] where: Access is one of ALLOWED, DENIED Permissions is one of FULL, CHANGE, READ or any combination of RWXDPO I've tried lower case SID i.e --sid, and specifying --access ALLOWED and --permissions FULL. Omitting the --SID argument entirely simply results in GPFS telling me I must specify a Name or an SID. [root at cesnode~]# mmsmb exportacl remove neuroscience2 --access ALLOWED --permissions FULL [E] The mmsmb exportacl remove command requires a Name or SID. Can anyone see my mistake? Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=5Nn7eUPeYe291x8f39jKybESLKv_W_XtkTkS8fTR-NI&m=c3uTUNFbPTWWcTNUNVVejlQ0xdnhBAfQdouTBlVgnjc&s=hsWeRhH-BhTEaAlrTPbJGlwCV-5Ui7t03Zcec9kywOA&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From christof.schmitt at us.ibm.com Tue Sep 26 21:49:09 2017 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Tue, 26 Sep 2017 20:49:09 +0000 Subject: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? In-Reply-To: References: , <1506350132.352.17.camel@imperial.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Sep 27 09:02:51 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 27 Sep 2017 08:02:51 +0000 Subject: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? In-Reply-To: References: , <1506350132.352.17.camel@imperial.ac.uk> Message-ID: I?m sorry, you?re right. I can only assume my brain was looking for an SID entry so when I saw Everyone:ALLOWED/FULL it didn?t process it at all. 4.2.3-4: [root at cesnode ~]# mmsmb exportacl list [testces] ACL:\Everyone:ALLOWED/FULL From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christof Schmitt Sent: 26 September 2017 21:49 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? The default for the "export ACL" is always to allow access to "Everyone", so that the the "export ACL" does not limit access by default, but only the file system ACL. I do not have systems with these code levels at hand, could you show the difference you see between PTF2 and PTF4? Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) ----- Original message ----- From: "Sobey, Richard A" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: "gpfsug-discuss at gpfsug.org" > Subject: Re: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? Date: Tue, Sep 26, 2017 2:59 AM There isn?t a default ACL being applied to the export at all now, which is fine, but it differs from the behaviour in 4.2.3 PTF2. Thanks Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 26 September 2017 09:22 To: gpfsug main discussion list > Cc: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? Hi Christof, thanks I?ll try it on a test cluster. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christof Schmitt Sent: 25 September 2017 22:41 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? 4.2.3 PTF4 seems to have a fix for this area. Can you try again with that PTF installed? Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) ----- Original message ----- From: "Sobey, Richard A" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at gpfsug.org" > Cc: Subject: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? Date: Mon, Sep 25, 2017 7:35 AM Hi all, This used to work (removing a SID from an ACL), but doesn't any more. Looks like a bug unless I'm being stupid. [root at cesnode~]# mmsmb exportacl list neuroscience2 --viewsids [neuroscience2] REVISION:1 ACL:S-1-1-0:ALLOWED/FULL [root at cesnode ~] mmsmb exportacl remove neuroscience2 --SID S-1-1-0 mmsmb exportacl remove: Incorrect option: --sid Usage: mmsmb exportacl remove ExportName {Name | --user UserName | --group GroupName | --system SystemName | --SID SID} [--access Access] [--permissions Permissions] [--viewsddl] [--viewsids] [-h|--help] where: Access is one of ALLOWED, DENIED Permissions is one of FULL, CHANGE, READ or any combination of RWXDPO I've tried lower case SID i.e --sid, and specifying --access ALLOWED and --permissions FULL. Omitting the --SID argument entirely simply results in GPFS telling me I must specify a Name or an SID. [root at cesnode~]# mmsmb exportacl remove neuroscience2 --access ALLOWED --permissions FULL [E] The mmsmb exportacl remove command requires a Name or SID. Can anyone see my mistake? Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=5Nn7eUPeYe291x8f39jKybESLKv_W_XtkTkS8fTR-NI&m=c3uTUNFbPTWWcTNUNVVejlQ0xdnhBAfQdouTBlVgnjc&s=hsWeRhH-BhTEaAlrTPbJGlwCV-5Ui7t03Zcec9kywOA&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=5Nn7eUPeYe291x8f39jKybESLKv_W_XtkTkS8fTR-NI&m=B-AqKIRCmLBzoWAhGn7NY-ZASOX25NuP_c_ndE8gy4A&s=S06OD3mbRedYjfwETO8tUnlOjnWT7pOX8nsYX5ebIdA&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From kenneth.waegeman at ugent.be Wed Sep 27 09:16:49 2017 From: kenneth.waegeman at ugent.be (Kenneth Waegeman) Date: Wed, 27 Sep 2017 10:16:49 +0200 Subject: [gpfsug-discuss] el7.4 compatibility Message-ID: Hi, Is there already some information available of gpfs (and protocols) on el7.4 ? Thanks! Kenneth From michael.holliday at crick.ac.uk Wed Sep 27 09:25:58 2017 From: michael.holliday at crick.ac.uk (Michael Holliday) Date: Wed, 27 Sep 2017 08:25:58 +0000 Subject: [gpfsug-discuss] File Quotas vs Inode Limits Message-ID: Hi All, I'm in process of setting up quota for our users. We currently have block quotas per file set, and an inode limit for each inode space. Our users have request more transparency relating to the inode limit as as it is they can't see any information. Are there any disadvantages to implementing file quotas, and increasing the inode limits so that they will not be reached? Michael Michael Holliday HPC Systems Engineer Tel: 0203 796 3167 The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Wed Sep 27 14:59:08 2017 From: bbanister at jumptrading.com (Bryan Banister) Date: Wed, 27 Sep 2017 13:59:08 +0000 Subject: [gpfsug-discuss] File Quotas vs Inode Limits In-Reply-To: References: Message-ID: Actually you will get a benefit in that you can set up a callback so that users get alerted when they got over a soft quota. We also set up a fileset quota so that the callback will automatically notify users when they exceed their block and file quotas for their fileset as well. Hope that helps, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Michael Holliday Sent: Wednesday, September 27, 2017 4:26 AM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] File Quotas vs Inode Limits Note: External Email ________________________________ Hi All, I'm in process of setting up quota for our users. We currently have block quotas per file set, and an inode limit for each inode space. Our users have request more transparency relating to the inode limit as as it is they can't see any information. Are there any disadvantages to implementing file quotas, and increasing the inode limits so that they will not be reached? Michael Michael Holliday HPC Systems Engineer Tel: 0203 796 3167 The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Greg.Lehmann at csiro.au Thu Sep 28 00:44:53 2017 From: Greg.Lehmann at csiro.au (Greg.Lehmann at csiro.au) Date: Wed, 27 Sep 2017 23:44:53 +0000 Subject: [gpfsug-discuss] el7.4 compatibility In-Reply-To: References: Message-ID: <0285ed0191fa43ac9b1f1b3e36a1f015@exch1-cdc.nexus.csiro.au> I guess I may as well ask about SLES 12 SP3 as well! TIA. -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Kenneth Waegeman Sent: Wednesday, 27 September 2017 6:17 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] el7.4 compatibility Hi, Is there already some information available of gpfs (and protocols) on el7.4 ? Thanks! Kenneth _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From bbanister at jumptrading.com Thu Sep 28 14:21:34 2017 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 28 Sep 2017 13:21:34 +0000 Subject: [gpfsug-discuss] el7.4 compatibility In-Reply-To: <0285ed0191fa43ac9b1f1b3e36a1f015@exch1-cdc.nexus.csiro.au> References: <0285ed0191fa43ac9b1f1b3e36a1f015@exch1-cdc.nexus.csiro.au> Message-ID: Please review this site: https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html Hope that helps, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Greg.Lehmann at csiro.au Sent: Wednesday, September 27, 2017 6:45 PM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] el7.4 compatibility Note: External Email ------------------------------------------------- I guess I may as well ask about SLES 12 SP3 as well! TIA. -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Kenneth Waegeman Sent: Wednesday, 27 September 2017 6:17 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] el7.4 compatibility Hi, Is there already some information available of gpfs (and protocols) on el7.4 ? Thanks! Kenneth _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. From JRLang at uwyo.edu Thu Sep 28 15:18:52 2017 From: JRLang at uwyo.edu (Jeffrey R. Lang) Date: Thu, 28 Sep 2017 14:18:52 +0000 Subject: [gpfsug-discuss] el7.4 compatibility In-Reply-To: References: <0285ed0191fa43ac9b1f1b3e36a1f015@exch1-cdc.nexus.csiro.au> Message-ID: I just tired to build the GPFS GPL module against the latest version of RHEL 7.4 kernel and the build fails. The link below show that it should work. cc kdump.o kdump-kern.o kdump-kern-dwarfs.o -o kdump -lpthread kdump-kern.o: In function `GetOffset': kdump-kern.c:(.text+0x9): undefined reference to `page_offset_base' kdump-kern.o: In function `KernInit': kdump-kern.c:(.text+0x58): undefined reference to `page_offset_base' collect2: error: ld returned 1 exit status make[1]: *** [modules] Error 1 make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux' make: *** [Modules] Error 1 -------------------------------------------------------- mmbuildgpl: Building GPL module failed at Thu Sep 28 08:12:14 MDT 2017. -------------------------------------------------------- mmbuildgpl: Command failed. Examine previous error messages to determine cause. [root at bkupsvr3 ~]# [root at bkupsvr3 ~]# [root at bkupsvr3 ~]# [root at bkupsvr3 ~]# [root at bkupsvr3 ~]# uname -a Linux bkupsvr3.arcc.uwyo.edu 3.10.0-693.2.2.el7.x86_64 #1 SMP Sat Sep 9 03:55:24 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux [root at bkupsvr3 ~]# mmdiag --version === mmdiag: version === Current GPFS build: "4.2.2.3 ". Built on Mar 16 2017 at 11:19:59 In order to use GPFS with RHEL 7.4 I have to use a 7.3 kernel. In my case 514.26.2 If I'm missing something can some one point me in the right direction? -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Bryan Banister Sent: Thursday, September 28, 2017 8:22 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] el7.4 compatibility Please review this site: https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html Hope that helps, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Greg.Lehmann at csiro.au Sent: Wednesday, September 27, 2017 6:45 PM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] el7.4 compatibility Note: External Email ------------------------------------------------- I guess I may as well ask about SLES 12 SP3 as well! TIA. -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Kenneth Waegeman Sent: Wednesday, 27 September 2017 6:17 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] el7.4 compatibility Hi, Is there already some information available of gpfs (and protocols) on el7.4 ? Thanks! Kenneth _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From xhejtman at ics.muni.cz Thu Sep 28 15:22:54 2017 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Thu, 28 Sep 2017 16:22:54 +0200 Subject: [gpfsug-discuss] el7.4 compatibility In-Reply-To: References: <0285ed0191fa43ac9b1f1b3e36a1f015@exch1-cdc.nexus.csiro.au> Message-ID: <20170928142254.xwjvp3qwnilazer7@ics.muni.cz> You need 4.2.3.4 GPFS version and it will work. On Thu, Sep 28, 2017 at 02:18:52PM +0000, Jeffrey R. Lang wrote: > I just tired to build the GPFS GPL module against the latest version of RHEL 7.4 kernel and the build fails. The link below show that it should work. > > cc kdump.o kdump-kern.o kdump-kern-dwarfs.o -o kdump -lpthread > kdump-kern.o: In function `GetOffset': > kdump-kern.c:(.text+0x9): undefined reference to `page_offset_base' > kdump-kern.o: In function `KernInit': > kdump-kern.c:(.text+0x58): undefined reference to `page_offset_base' > collect2: error: ld returned 1 exit status > make[1]: *** [modules] Error 1 > make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux' > make: *** [Modules] Error 1 > -------------------------------------------------------- > mmbuildgpl: Building GPL module failed at Thu Sep 28 08:12:14 MDT 2017. > -------------------------------------------------------- > mmbuildgpl: Command failed. Examine previous error messages to determine cause. > [root at bkupsvr3 ~]# > [root at bkupsvr3 ~]# > [root at bkupsvr3 ~]# > [root at bkupsvr3 ~]# > [root at bkupsvr3 ~]# uname -a > Linux bkupsvr3.arcc.uwyo.edu 3.10.0-693.2.2.el7.x86_64 #1 SMP Sat Sep 9 03:55:24 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux > [root at bkupsvr3 ~]# mmdiag --version > > === mmdiag: version === > Current GPFS build: "4.2.2.3 ". > Built on Mar 16 2017 at 11:19:59 > > In order to use GPFS with RHEL 7.4 I have to use a 7.3 kernel. In my case 514.26.2 > > If I'm missing something can some one point me in the right direction? > > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Bryan Banister > Sent: Thursday, September 28, 2017 8:22 AM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] el7.4 compatibility > > Please review this site: > > https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html > > Hope that helps, > -Bryan > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Greg.Lehmann at csiro.au > Sent: Wednesday, September 27, 2017 6:45 PM > To: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] el7.4 compatibility > > Note: External Email > ------------------------------------------------- > > I guess I may as well ask about SLES 12 SP3 as well! TIA. > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Kenneth Waegeman > Sent: Wednesday, 27 September 2017 6:17 PM > To: gpfsug-discuss at spectrumscale.org > Subject: [gpfsug-discuss] el7.4 compatibility > > Hi, > > Is there already some information available of gpfs (and protocols) on > el7.4 ? > > Thanks! > > Kenneth > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > ________________________________ > > Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Luk?? Hejtm?nek From S.J.Thompson at bham.ac.uk Thu Sep 28 15:23:53 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Thu, 28 Sep 2017 14:23:53 +0000 Subject: [gpfsug-discuss] el7.4 compatibility In-Reply-To: References: <0285ed0191fa43ac9b1f1b3e36a1f015@exch1-cdc.nexus.csiro.au> Message-ID: The 7.4 kernels are listed as having been tested by IBM. Having said that, we have clients running 7.4 kernel and its OK, but we are 4.2.3.4efix2, so bump versions... Simon On 28/09/2017, 15:18, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jeffrey R. Lang" wrote: >I just tired to build the GPFS GPL module against the latest version of >RHEL 7.4 kernel and the build fails. The link below show that it should >work. > >cc kdump.o kdump-kern.o kdump-kern-dwarfs.o -o kdump -lpthread >kdump-kern.o: In function `GetOffset': >kdump-kern.c:(.text+0x9): undefined reference to `page_offset_base' >kdump-kern.o: In function `KernInit': >kdump-kern.c:(.text+0x58): undefined reference to `page_offset_base' >collect2: error: ld returned 1 exit status >make[1]: *** [modules] Error 1 >make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux' >make: *** [Modules] Error 1 >-------------------------------------------------------- >mmbuildgpl: Building GPL module failed at Thu Sep 28 08:12:14 MDT 2017. >-------------------------------------------------------- >mmbuildgpl: Command failed. Examine previous error messages to determine >cause. >[root at bkupsvr3 ~]# >[root at bkupsvr3 ~]# >[root at bkupsvr3 ~]# >[root at bkupsvr3 ~]# >[root at bkupsvr3 ~]# uname -a >Linux bkupsvr3.arcc.uwyo.edu 3.10.0-693.2.2.el7.x86_64 #1 SMP Sat Sep 9 >03:55:24 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux >[root at bkupsvr3 ~]# mmdiag --version > >=== mmdiag: version === >Current GPFS build: "4.2.2.3 ". >Built on Mar 16 2017 at 11:19:59 > >In order to use GPFS with RHEL 7.4 I have to use a 7.3 kernel. In my >case 514.26.2 > >If I'm missing something can some one point me in the right direction? > > >-----Original Message----- >From: gpfsug-discuss-bounces at spectrumscale.org >[mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Bryan >Banister >Sent: Thursday, September 28, 2017 8:22 AM >To: gpfsug main discussion list >Subject: Re: [gpfsug-discuss] el7.4 compatibility > >Please review this site: > >https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html > >Hope that helps, >-Bryan > >-----Original Message----- >From: gpfsug-discuss-bounces at spectrumscale.org >[mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of >Greg.Lehmann at csiro.au >Sent: Wednesday, September 27, 2017 6:45 PM >To: gpfsug-discuss at spectrumscale.org >Subject: Re: [gpfsug-discuss] el7.4 compatibility > >Note: External Email >------------------------------------------------- > >I guess I may as well ask about SLES 12 SP3 as well! TIA. > >-----Original Message----- >From: gpfsug-discuss-bounces at spectrumscale.org >[mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Kenneth >Waegeman >Sent: Wednesday, 27 September 2017 6:17 PM >To: gpfsug-discuss at spectrumscale.org >Subject: [gpfsug-discuss] el7.4 compatibility > >Hi, > >Is there already some information available of gpfs (and protocols) on >el7.4 ? > >Thanks! > >Kenneth > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > >________________________________ > >Note: This email is for the confidential use of the named addressee(s) >only and may contain proprietary, confidential or privileged information. >If you are not the intended recipient, you are hereby notified that any >review, dissemination or copying of this email is strictly prohibited, >and to please notify the sender immediately and destroy this email and >any attachments. Email transmission cannot be guaranteed to be secure or >error-free. The Company, therefore, does not make any guarantees as to >the completeness or accuracy of this email or any attachments. This email >is for informational purposes only and does not constitute a >recommendation, offer, request or solicitation of any kind to buy, sell, >subscribe, redeem or perform any type of transaction of a financial >product. >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss From kenneth.waegeman at ugent.be Thu Sep 28 15:36:04 2017 From: kenneth.waegeman at ugent.be (Kenneth Waegeman) Date: Thu, 28 Sep 2017 16:36:04 +0200 Subject: [gpfsug-discuss] el7.4 compatibility In-Reply-To: References: <0285ed0191fa43ac9b1f1b3e36a1f015@exch1-cdc.nexus.csiro.au> Message-ID: <087bdf18-5763-0154-8515-7b9d04e5b302@ugent.be> On 28/09/17 16:23, Simon Thompson (IT Research Support) wrote: > The 7.4 kernels are listed as having been tested by IBM. Hi, Were did you find this? > > Having said that, we have clients running 7.4 kernel and its OK, but we > are 4.2.3.4efix2, so bump versions... Do you have some information about the efix2? Is this for 7.4 ? And where should we find this :-) Thank you! Kenneth > > Simon > > On 28/09/2017, 15:18, "gpfsug-discuss-bounces at spectrumscale.org on behalf > of Jeffrey R. Lang" JRLang at uwyo.edu> wrote: > >> I just tired to build the GPFS GPL module against the latest version of >> RHEL 7.4 kernel and the build fails. The link below show that it should >> work. >> >> cc kdump.o kdump-kern.o kdump-kern-dwarfs.o -o kdump -lpthread >> kdump-kern.o: In function `GetOffset': >> kdump-kern.c:(.text+0x9): undefined reference to `page_offset_base' >> kdump-kern.o: In function `KernInit': >> kdump-kern.c:(.text+0x58): undefined reference to `page_offset_base' >> collect2: error: ld returned 1 exit status >> make[1]: *** [modules] Error 1 >> make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux' >> make: *** [Modules] Error 1 >> -------------------------------------------------------- >> mmbuildgpl: Building GPL module failed at Thu Sep 28 08:12:14 MDT 2017. >> -------------------------------------------------------- >> mmbuildgpl: Command failed. Examine previous error messages to determine >> cause. >> [root at bkupsvr3 ~]# >> [root at bkupsvr3 ~]# >> [root at bkupsvr3 ~]# >> [root at bkupsvr3 ~]# >> [root at bkupsvr3 ~]# uname -a >> Linux bkupsvr3.arcc.uwyo.edu 3.10.0-693.2.2.el7.x86_64 #1 SMP Sat Sep 9 >> 03:55:24 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux >> [root at bkupsvr3 ~]# mmdiag --version >> >> === mmdiag: version === >> Current GPFS build: "4.2.2.3 ". >> Built on Mar 16 2017 at 11:19:59 >> >> In order to use GPFS with RHEL 7.4 I have to use a 7.3 kernel. In my >> case 514.26.2 >> >> If I'm missing something can some one point me in the right direction? >> >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at spectrumscale.org >> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Bryan >> Banister >> Sent: Thursday, September 28, 2017 8:22 AM >> To: gpfsug main discussion list >> Subject: Re: [gpfsug-discuss] el7.4 compatibility >> >> Please review this site: >> >> https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html >> >> Hope that helps, >> -Bryan >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at spectrumscale.org >> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of >> Greg.Lehmann at csiro.au >> Sent: Wednesday, September 27, 2017 6:45 PM >> To: gpfsug-discuss at spectrumscale.org >> Subject: Re: [gpfsug-discuss] el7.4 compatibility >> >> Note: External Email >> ------------------------------------------------- >> >> I guess I may as well ask about SLES 12 SP3 as well! TIA. >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at spectrumscale.org >> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Kenneth >> Waegeman >> Sent: Wednesday, 27 September 2017 6:17 PM >> To: gpfsug-discuss at spectrumscale.org >> Subject: [gpfsug-discuss] el7.4 compatibility >> >> Hi, >> >> Is there already some information available of gpfs (and protocols) on >> el7.4 ? >> >> Thanks! >> >> Kenneth >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> ________________________________ >> >> Note: This email is for the confidential use of the named addressee(s) >> only and may contain proprietary, confidential or privileged information. >> If you are not the intended recipient, you are hereby notified that any >> review, dissemination or copying of this email is strictly prohibited, >> and to please notify the sender immediately and destroy this email and >> any attachments. Email transmission cannot be guaranteed to be secure or >> error-free. The Company, therefore, does not make any guarantees as to >> the completeness or accuracy of this email or any attachments. This email >> is for informational purposes only and does not constitute a >> recommendation, offer, request or solicitation of any kind to buy, sell, >> subscribe, redeem or perform any type of transaction of a financial >> product. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From S.J.Thompson at bham.ac.uk Thu Sep 28 15:45:25 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Thu, 28 Sep 2017 14:45:25 +0000 Subject: [gpfsug-discuss] el7.4 compatibility In-Reply-To: <087bdf18-5763-0154-8515-7b9d04e5b302@ugent.be> References: <0285ed0191fa43ac9b1f1b3e36a1f015@exch1-cdc.nexus.csiro.au> <087bdf18-5763-0154-8515-7b9d04e5b302@ugent.be> Message-ID: Aren't listed as tested Sorry ... 4.2.3.4 we have used with 7.4 as well, efix2 includes a fix for an AFM issue we have. Simon On 28/09/2017, 15:36, "kenneth.waegeman at ugent.be" wrote: > > >On 28/09/17 16:23, Simon Thompson (IT Research Support) wrote: >> The 7.4 kernels are listed as having been tested by IBM. >Hi, > >Were did you find this? >> >> Having said that, we have clients running 7.4 kernel and its OK, but we >> are 4.2.3.4efix2, so bump versions... >Do you have some information about the efix2? Is this for 7.4 ? And >where should we find this :-) > >Thank you! > >Kenneth > >> >> Simon >> >> On 28/09/2017, 15:18, "gpfsug-discuss-bounces at spectrumscale.org on >>behalf >> of Jeffrey R. Lang" >of >> JRLang at uwyo.edu> wrote: >> >>> I just tired to build the GPFS GPL module against the latest version of >>> RHEL 7.4 kernel and the build fails. The link below show that it >>>should >>> work. >>> >>> cc kdump.o kdump-kern.o kdump-kern-dwarfs.o -o kdump -lpthread >>> kdump-kern.o: In function `GetOffset': >>> kdump-kern.c:(.text+0x9): undefined reference to `page_offset_base' >>> kdump-kern.o: In function `KernInit': >>> kdump-kern.c:(.text+0x58): undefined reference to `page_offset_base' >>> collect2: error: ld returned 1 exit status >>> make[1]: *** [modules] Error 1 >>> make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux' >>> make: *** [Modules] Error 1 >>> -------------------------------------------------------- >>> mmbuildgpl: Building GPL module failed at Thu Sep 28 08:12:14 MDT 2017. >>> -------------------------------------------------------- >>> mmbuildgpl: Command failed. Examine previous error messages to >>>determine >>> cause. >>> [root at bkupsvr3 ~]# >>> [root at bkupsvr3 ~]# >>> [root at bkupsvr3 ~]# >>> [root at bkupsvr3 ~]# >>> [root at bkupsvr3 ~]# uname -a >>> Linux bkupsvr3.arcc.uwyo.edu 3.10.0-693.2.2.el7.x86_64 #1 SMP Sat Sep 9 >>> 03:55:24 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux >>> [root at bkupsvr3 ~]# mmdiag --version >>> >>> === mmdiag: version === >>> Current GPFS build: "4.2.2.3 ". >>> Built on Mar 16 2017 at 11:19:59 >>> >>> In order to use GPFS with RHEL 7.4 I have to use a 7.3 kernel. In my >>> case 514.26.2 >>> >>> If I'm missing something can some one point me in the right direction? >>> >>> >>> -----Original Message----- >>> From: gpfsug-discuss-bounces at spectrumscale.org >>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Bryan >>> Banister >>> Sent: Thursday, September 28, 2017 8:22 AM >>> To: gpfsug main discussion list >>> Subject: Re: [gpfsug-discuss] el7.4 compatibility >>> >>> Please review this site: >>> >>> >>>https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.ht >>>ml >>> >>> Hope that helps, >>> -Bryan >>> >>> -----Original Message----- >>> From: gpfsug-discuss-bounces at spectrumscale.org >>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of >>> Greg.Lehmann at csiro.au >>> Sent: Wednesday, September 27, 2017 6:45 PM >>> To: gpfsug-discuss at spectrumscale.org >>> Subject: Re: [gpfsug-discuss] el7.4 compatibility >>> >>> Note: External Email >>> ------------------------------------------------- >>> >>> I guess I may as well ask about SLES 12 SP3 as well! TIA. >>> >>> -----Original Message----- >>> From: gpfsug-discuss-bounces at spectrumscale.org >>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Kenneth >>> Waegeman >>> Sent: Wednesday, 27 September 2017 6:17 PM >>> To: gpfsug-discuss at spectrumscale.org >>> Subject: [gpfsug-discuss] el7.4 compatibility >>> >>> Hi, >>> >>> Is there already some information available of gpfs (and protocols) on >>> el7.4 ? >>> >>> Thanks! >>> >>> Kenneth >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> ________________________________ >>> >>> Note: This email is for the confidential use of the named addressee(s) >>> only and may contain proprietary, confidential or privileged >>>information. >>> If you are not the intended recipient, you are hereby notified that any >>> review, dissemination or copying of this email is strictly prohibited, >>> and to please notify the sender immediately and destroy this email and >>> any attachments. Email transmission cannot be guaranteed to be secure >>>or >>> error-free. The Company, therefore, does not make any guarantees as to >>> the completeness or accuracy of this email or any attachments. This >>>email >>> is for informational purposes only and does not constitute a >>> recommendation, offer, request or solicitation of any kind to buy, >>>sell, >>> subscribe, redeem or perform any type of transaction of a financial >>> product. >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > From aaron.s.knister at nasa.gov Fri Sep 29 02:59:39 2017 From: aaron.s.knister at nasa.gov (Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]) Date: Fri, 29 Sep 2017 01:59:39 +0000 Subject: [gpfsug-discuss] Latest recommended 4.2 efix? Message-ID: Hi Everyone, What?s the latest recommended efix release for 4.2.3.4? I?m working on testing a 4.1 to 4.2 migration and was reminded today of some fun bugs in 4.2.3.4 for which I think there are efixes. Alternatively, any word on a 4.2.3.5 release date? -Aaron -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.hearns at asml.com Fri Sep 29 10:02:26 2017 From: john.hearns at asml.com (John Hearns) Date: Fri, 29 Sep 2017 09:02:26 +0000 Subject: [gpfsug-discuss] el7.4 compatibility In-Reply-To: References: <0285ed0191fa43ac9b1f1b3e36a1f015@exch1-cdc.nexus.csiro.au> <087bdf18-5763-0154-8515-7b9d04e5b302@ugent.be> Message-ID: Simon, I would appreciate a heads up on that AFM issue. I upgraded to 4.2.3.4 this morning, to deal with an AFM issue, which is if a remote NFS mount goes down then an asynchronous operation such as a read can be stopped. I must admit to being not clued up on how the efixes are distributed. I downloaded the 4.2.3.4 installer for Linux yesterday. Should I be searching for additional fix packs on top of that (which I am in fact doing now). John H -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (IT Research Support) Sent: Thursday, September 28, 2017 4:45 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] el7.4 compatibility Aren't listed as tested Sorry ... 4.2.3.4 we have used with 7.4 as well, efix2 includes a fix for an AFM issue we have. Simon On 28/09/2017, 15:36, "kenneth.waegeman at ugent.be" wrote: > > >On 28/09/17 16:23, Simon Thompson (IT Research Support) wrote: >> The 7.4 kernels are listed as having been tested by IBM. >Hi, > >Were did you find this? >> >> Having said that, we have clients running 7.4 kernel and its OK, but >> we are 4.2.3.4efix2, so bump versions... >Do you have some information about the efix2? Is this for 7.4 ? And >where should we find this :-) > >Thank you! > >Kenneth > >> >> Simon >> >> On 28/09/2017, 15:18, "gpfsug-discuss-bounces at spectrumscale.org on >>behalf of Jeffrey R. Lang" >on behalf of JRLang at uwyo.edu> wrote: >> >>> I just tired to build the GPFS GPL module against the latest version >>>of RHEL 7.4 kernel and the build fails. The link below show that it >>>should work. >>> >>> cc kdump.o kdump-kern.o kdump-kern-dwarfs.o -o kdump -lpthread >>> kdump-kern.o: In function `GetOffset': >>> kdump-kern.c:(.text+0x9): undefined reference to `page_offset_base' >>> kdump-kern.o: In function `KernInit': >>> kdump-kern.c:(.text+0x58): undefined reference to `page_offset_base' >>> collect2: error: ld returned 1 exit status >>> make[1]: *** [modules] Error 1 >>> make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux' >>> make: *** [Modules] Error 1 >>> -------------------------------------------------------- >>> mmbuildgpl: Building GPL module failed at Thu Sep 28 08:12:14 MDT 2017. >>> -------------------------------------------------------- >>> mmbuildgpl: Command failed. Examine previous error messages to >>>determine cause. >>> [root at bkupsvr3 ~]# >>> [root at bkupsvr3 ~]# >>> [root at bkupsvr3 ~]# >>> [root at bkupsvr3 ~]# >>> [root at bkupsvr3 ~]# uname -a >>> Linux bkupsvr3.arcc.uwyo.edu 3.10.0-693.2.2.el7.x86_64 #1 SMP Sat >>>Sep 9 >>> 03:55:24 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux >>> [root at bkupsvr3 ~]# mmdiag --version >>> >>> === mmdiag: version === >>> Current GPFS build: "4.2.2.3 ". >>> Built on Mar 16 2017 at 11:19:59 >>> >>> In order to use GPFS with RHEL 7.4 I have to use a 7.3 kernel. In >>> my case 514.26.2 >>> >>> If I'm missing something can some one point me in the right direction? >>> >>> >>> -----Original Message----- >>> From: gpfsug-discuss-bounces at spectrumscale.org >>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Bryan >>> Banister >>> Sent: Thursday, September 28, 2017 8:22 AM >>> To: gpfsug main discussion list >>> Subject: Re: [gpfsug-discuss] el7.4 compatibility >>> >>> Please review this site: >>> >>> >>>https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fww >>>w.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSTXKQY%2Fgpfsclustersfaq >>>.ht&data=01%7C01%7Cjohn.hearns%40asml.com%7C1c91f855bc124c31f81a08d50 >>>67f949a%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=nK6KEzCD62kU3njL >>>kIFKL69V3jyN836K5pHMX19tWk8%3D&reserved=0 >>>ml >>> >>> Hope that helps, >>> -Bryan >>> >>> -----Original Message----- >>> From: gpfsug-discuss-bounces at spectrumscale.org >>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of >>> Greg.Lehmann at csiro.au >>> Sent: Wednesday, September 27, 2017 6:45 PM >>> To: gpfsug-discuss at spectrumscale.org >>> Subject: Re: [gpfsug-discuss] el7.4 compatibility >>> >>> Note: External Email >>> ------------------------------------------------- >>> >>> I guess I may as well ask about SLES 12 SP3 as well! TIA. >>> >>> -----Original Message----- >>> From: gpfsug-discuss-bounces at spectrumscale.org >>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of >>> Kenneth Waegeman >>> Sent: Wednesday, 27 September 2017 6:17 PM >>> To: gpfsug-discuss at spectrumscale.org >>> Subject: [gpfsug-discuss] el7.4 compatibility >>> >>> Hi, >>> >>> Is there already some information available of gpfs (and protocols) >>> on >>> el7.4 ? >>> >>> Thanks! >>> >>> Kenneth >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgp >>> fsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.h >>> earns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944e >>> b2a39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Y >>> tqc6pw%3D&reserved=0 _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgp >>> fsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.h >>> earns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944e >>> b2a39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Y >>> tqc6pw%3D&reserved=0 >>> >>> >>> ________________________________ >>> >>> Note: This email is for the confidential use of the named >>>addressee(s) only and may contain proprietary, confidential or >>>privileged information. >>> If you are not the intended recipient, you are hereby notified that >>>any review, dissemination or copying of this email is strictly >>>prohibited, and to please notify the sender immediately and destroy >>>this email and any attachments. Email transmission cannot be >>>guaranteed to be secure or error-free. The Company, therefore, does >>>not make any guarantees as to the completeness or accuracy of this >>>email or any attachments. This email is for informational purposes >>>only and does not constitute a recommendation, offer, request or >>>solicitation of any kind to buy, sell, subscribe, redeem or perform >>>any type of transaction of a financial product. >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> >>>https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpf >>>sug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hea >>>rns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944eb2a >>>39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Ytqc6 >>>pw%3D&reserved=0 _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> >>>https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpf >>>sug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hea >>>rns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944eb2a >>>39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Ytqc6 >>>pw%3D&reserved=0 >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpf >> sug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hea >> rns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944eb2a >> 39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Ytqc6 >> pw%3D&reserved=0 > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hearns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Ytqc6pw%3D&reserved=0 -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. From r.sobey at imperial.ac.uk Fri Sep 29 10:04:49 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Fri, 29 Sep 2017 09:04:49 +0000 Subject: [gpfsug-discuss] el7.4 compatibility In-Reply-To: References: <0285ed0191fa43ac9b1f1b3e36a1f015@exch1-cdc.nexus.csiro.au> <087bdf18-5763-0154-8515-7b9d04e5b302@ugent.be> Message-ID: Efixes (in my one time only limited experience!) come direct from IBM as a result of a PMR. Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of John Hearns Sent: 29 September 2017 10:02 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] el7.4 compatibility Simon, I would appreciate a heads up on that AFM issue. I upgraded to 4.2.3.4 this morning, to deal with an AFM issue, which is if a remote NFS mount goes down then an asynchronous operation such as a read can be stopped. I must admit to being not clued up on how the efixes are distributed. I downloaded the 4.2.3.4 installer for Linux yesterday. Should I be searching for additional fix packs on top of that (which I am in fact doing now). John H -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (IT Research Support) Sent: Thursday, September 28, 2017 4:45 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] el7.4 compatibility Aren't listed as tested Sorry ... 4.2.3.4 we have used with 7.4 as well, efix2 includes a fix for an AFM issue we have. Simon On 28/09/2017, 15:36, "kenneth.waegeman at ugent.be" wrote: > > >On 28/09/17 16:23, Simon Thompson (IT Research Support) wrote: >> The 7.4 kernels are listed as having been tested by IBM. >Hi, > >Were did you find this? >> >> Having said that, we have clients running 7.4 kernel and its OK, but >> we are 4.2.3.4efix2, so bump versions... >Do you have some information about the efix2? Is this for 7.4 ? And >where should we find this :-) > >Thank you! > >Kenneth > >> >> Simon >> >> On 28/09/2017, 15:18, "gpfsug-discuss-bounces at spectrumscale.org on >>behalf of Jeffrey R. Lang" >on behalf of JRLang at uwyo.edu> wrote: >> >>> I just tired to build the GPFS GPL module against the latest version >>>of RHEL 7.4 kernel and the build fails. The link below show that it >>>should work. >>> >>> cc kdump.o kdump-kern.o kdump-kern-dwarfs.o -o kdump -lpthread >>> kdump-kern.o: In function `GetOffset': >>> kdump-kern.c:(.text+0x9): undefined reference to `page_offset_base' >>> kdump-kern.o: In function `KernInit': >>> kdump-kern.c:(.text+0x58): undefined reference to `page_offset_base' >>> collect2: error: ld returned 1 exit status >>> make[1]: *** [modules] Error 1 >>> make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux' >>> make: *** [Modules] Error 1 >>> -------------------------------------------------------- >>> mmbuildgpl: Building GPL module failed at Thu Sep 28 08:12:14 MDT 2017. >>> -------------------------------------------------------- >>> mmbuildgpl: Command failed. Examine previous error messages to >>>determine cause. >>> [root at bkupsvr3 ~]# >>> [root at bkupsvr3 ~]# >>> [root at bkupsvr3 ~]# >>> [root at bkupsvr3 ~]# >>> [root at bkupsvr3 ~]# uname -a >>> Linux bkupsvr3.arcc.uwyo.edu 3.10.0-693.2.2.el7.x86_64 #1 SMP Sat >>>Sep 9 >>> 03:55:24 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux >>> [root at bkupsvr3 ~]# mmdiag --version >>> >>> === mmdiag: version === >>> Current GPFS build: "4.2.2.3 ". >>> Built on Mar 16 2017 at 11:19:59 >>> >>> In order to use GPFS with RHEL 7.4 I have to use a 7.3 kernel. In >>> my case 514.26.2 >>> >>> If I'm missing something can some one point me in the right direction? >>> >>> >>> -----Original Message----- >>> From: gpfsug-discuss-bounces at spectrumscale.org >>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Bryan >>> Banister >>> Sent: Thursday, September 28, 2017 8:22 AM >>> To: gpfsug main discussion list >>> Subject: Re: [gpfsug-discuss] el7.4 compatibility >>> >>> Please review this site: >>> >>> >>>https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fww >>>w.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSTXKQY%2Fgpfsclustersfaq >>>.ht&data=01%7C01%7Cjohn.hearns%40asml.com%7C1c91f855bc124c31f81a08d50 >>>67f949a%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=nK6KEzCD62kU3njL >>>kIFKL69V3jyN836K5pHMX19tWk8%3D&reserved=0 >>>ml >>> >>> Hope that helps, >>> -Bryan >>> >>> -----Original Message----- >>> From: gpfsug-discuss-bounces at spectrumscale.org >>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of >>> Greg.Lehmann at csiro.au >>> Sent: Wednesday, September 27, 2017 6:45 PM >>> To: gpfsug-discuss at spectrumscale.org >>> Subject: Re: [gpfsug-discuss] el7.4 compatibility >>> >>> Note: External Email >>> ------------------------------------------------- >>> >>> I guess I may as well ask about SLES 12 SP3 as well! TIA. >>> >>> -----Original Message----- >>> From: gpfsug-discuss-bounces at spectrumscale.org >>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of >>> Kenneth Waegeman >>> Sent: Wednesday, 27 September 2017 6:17 PM >>> To: gpfsug-discuss at spectrumscale.org >>> Subject: [gpfsug-discuss] el7.4 compatibility >>> >>> Hi, >>> >>> Is there already some information available of gpfs (and protocols) >>> on >>> el7.4 ? >>> >>> Thanks! >>> >>> Kenneth >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgp >>> fsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.h >>> earns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944e >>> b2a39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Y >>> tqc6pw%3D&reserved=0 _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgp >>> fsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.h >>> earns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944e >>> b2a39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Y >>> tqc6pw%3D&reserved=0 >>> >>> >>> ________________________________ >>> >>> Note: This email is for the confidential use of the named >>>addressee(s) only and may contain proprietary, confidential or >>>privileged information. >>> If you are not the intended recipient, you are hereby notified that >>>any review, dissemination or copying of this email is strictly >>>prohibited, and to please notify the sender immediately and destroy >>>this email and any attachments. Email transmission cannot be >>>guaranteed to be secure or error-free. The Company, therefore, does >>>not make any guarantees as to the completeness or accuracy of this >>>email or any attachments. This email is for informational purposes >>>only and does not constitute a recommendation, offer, request or >>>solicitation of any kind to buy, sell, subscribe, redeem or perform >>>any type of transaction of a financial product. >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> >>>https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpf >>>sug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hea >>>rns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944eb2a >>>39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Ytqc6 >>>pw%3D&reserved=0 _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> >>>https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpf >>>sug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hea >>>rns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944eb2a >>>39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Ytqc6 >>>pw%3D&reserved=0 >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpf >> sug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hea >> rns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944eb2a >> 39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Ytqc6 >> pw%3D&reserved=0 > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hearns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Ytqc6pw%3D&reserved=0 -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From S.J.Thompson at bham.ac.uk Fri Sep 29 10:39:43 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Fri, 29 Sep 2017 09:39:43 +0000 Subject: [gpfsug-discuss] el7.4 compatibility In-Reply-To: References: <0285ed0191fa43ac9b1f1b3e36a1f015@exch1-cdc.nexus.csiro.au> <087bdf18-5763-0154-8515-7b9d04e5b302@ugent.be> Message-ID: Correct they some from IBM support. The AFM issue we have (and is fixed in the efix) is if you have client code running on the AFM cache that uses truncate. The AFM write coalescing processing does something funny with it, so the file isn't truncated and then the data you write afterwards isn't copied back to home. We found this with ABAQUS code running on our HPC nodes onto the AFM cache, I.e. At home, the final packed output file from ABAQUS is corrupt as its the "untruncated and then filled" version of the file (so just a big blob of empty data). I would guess that anything using truncate would see the same issue. 4.2.3.x: APAR IV99796 See IBM Flash Alert at: http://www-01.ibm.com/support/docview.wss?uid=ssg1S1010629&myns=s033&mynp=O CSTXKQY&mynp=OCSWJ00&mync=E&cm_sp=s033-_-OCSTXKQY-OCSWJ00-_-E Its remedied in efix2, of course remember that an efix has not gone through a full testing validation cycle (otherwise it would be a PTF), but we have not seen any issues in our environments running 4.2.3.4efix2. Simon On 29/09/2017, 10:04, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Sobey, Richard A" wrote: >Efixes (in my one time only limited experience!) come direct from IBM as >a result of a PMR. >Richard > >-----Original Message----- >From: gpfsug-discuss-bounces at spectrumscale.org >[mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of John Hearns >Sent: 29 September 2017 10:02 >To: gpfsug main discussion list >Subject: Re: [gpfsug-discuss] el7.4 compatibility > >Simon, >I would appreciate a heads up on that AFM issue. >I upgraded to 4.2.3.4 this morning, to deal with an AFM issue, which is >if a remote NFS mount goes down then an asynchronous operation such as a >read can be stopped. > >I must admit to being not clued up on how the efixes are distributed. I >downloaded the 4.2.3.4 installer for Linux yesterday. >Should I be searching for additional fix packs on top of that (which I am >in fact doing now). > >John H > > > > > >-----Original Message----- >From: gpfsug-discuss-bounces at spectrumscale.org >[mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon >Thompson (IT Research Support) >Sent: Thursday, September 28, 2017 4:45 PM >To: gpfsug main discussion list >Subject: Re: [gpfsug-discuss] el7.4 compatibility > > >Aren't listed as tested > >Sorry ... >4.2.3.4 we have used with 7.4 as well, efix2 includes a fix for an AFM >issue we have. > >Simon > >On 28/09/2017, 15:36, "kenneth.waegeman at ugent.be" > wrote: > >> >> >>On 28/09/17 16:23, Simon Thompson (IT Research Support) wrote: >>> The 7.4 kernels are listed as having been tested by IBM. >>Hi, >> >>Were did you find this? >>> >>> Having said that, we have clients running 7.4 kernel and its OK, but >>> we are 4.2.3.4efix2, so bump versions... >>Do you have some information about the efix2? Is this for 7.4 ? And >>where should we find this :-) >> >>Thank you! >> >>Kenneth >> >>> >>> Simon >>> >>> On 28/09/2017, 15:18, "gpfsug-discuss-bounces at spectrumscale.org on >>>behalf of Jeffrey R. Lang" >>on behalf of JRLang at uwyo.edu> wrote: >>> >>>> I just tired to build the GPFS GPL module against the latest version >>>>of RHEL 7.4 kernel and the build fails. The link below show that it >>>>should work. >>>> >>>> cc kdump.o kdump-kern.o kdump-kern-dwarfs.o -o kdump -lpthread >>>> kdump-kern.o: In function `GetOffset': >>>> kdump-kern.c:(.text+0x9): undefined reference to `page_offset_base' >>>> kdump-kern.o: In function `KernInit': >>>> kdump-kern.c:(.text+0x58): undefined reference to `page_offset_base' >>>> collect2: error: ld returned 1 exit status >>>> make[1]: *** [modules] Error 1 >>>> make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux' >>>> make: *** [Modules] Error 1 >>>> -------------------------------------------------------- >>>> mmbuildgpl: Building GPL module failed at Thu Sep 28 08:12:14 MDT >>>>2017. >>>> -------------------------------------------------------- >>>> mmbuildgpl: Command failed. Examine previous error messages to >>>>determine cause. >>>> [root at bkupsvr3 ~]# >>>> [root at bkupsvr3 ~]# >>>> [root at bkupsvr3 ~]# >>>> [root at bkupsvr3 ~]# >>>> [root at bkupsvr3 ~]# uname -a >>>> Linux bkupsvr3.arcc.uwyo.edu 3.10.0-693.2.2.el7.x86_64 #1 SMP Sat >>>>Sep 9 >>>> 03:55:24 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux >>>> [root at bkupsvr3 ~]# mmdiag --version >>>> >>>> === mmdiag: version === >>>> Current GPFS build: "4.2.2.3 ". >>>> Built on Mar 16 2017 at 11:19:59 >>>> >>>> In order to use GPFS with RHEL 7.4 I have to use a 7.3 kernel. In >>>> my case 514.26.2 >>>> >>>> If I'm missing something can some one point me in the right direction? >>>> >>>> >>>> -----Original Message----- >>>> From: gpfsug-discuss-bounces at spectrumscale.org >>>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Bryan >>>> Banister >>>> Sent: Thursday, September 28, 2017 8:22 AM >>>> To: gpfsug main discussion list >>>> Subject: Re: [gpfsug-discuss] el7.4 compatibility >>>> >>>> Please review this site: >>>> >>>> >>>>https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fww >>>>w.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSTXKQY%2Fgpfsclustersfaq >>>>.ht&data=01%7C01%7Cjohn.hearns%40asml.com%7C1c91f855bc124c31f81a08d50 >>>>67f949a%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=nK6KEzCD62kU3njL >>>>kIFKL69V3jyN836K5pHMX19tWk8%3D&reserved=0 >>>>ml >>>> >>>> Hope that helps, >>>> -Bryan >>>> >>>> -----Original Message----- >>>> From: gpfsug-discuss-bounces at spectrumscale.org >>>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of >>>> Greg.Lehmann at csiro.au >>>> Sent: Wednesday, September 27, 2017 6:45 PM >>>> To: gpfsug-discuss at spectrumscale.org >>>> Subject: Re: [gpfsug-discuss] el7.4 compatibility >>>> >>>> Note: External Email >>>> ------------------------------------------------- >>>> >>>> I guess I may as well ask about SLES 12 SP3 as well! TIA. >>>> >>>> -----Original Message----- >>>> From: gpfsug-discuss-bounces at spectrumscale.org >>>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of >>>> Kenneth Waegeman >>>> Sent: Wednesday, 27 September 2017 6:17 PM >>>> To: gpfsug-discuss at spectrumscale.org >>>> Subject: [gpfsug-discuss] el7.4 compatibility >>>> >>>> Hi, >>>> >>>> Is there already some information available of gpfs (and protocols) >>>> on >>>> el7.4 ? >>>> >>>> Thanks! >>>> >>>> Kenneth >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgp >>>> fsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.h >>>> earns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944e >>>> b2a39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Y >>>> tqc6pw%3D&reserved=0 _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgp >>>> fsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.h >>>> earns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944e >>>> b2a39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Y >>>> tqc6pw%3D&reserved=0 >>>> >>>> >>>> ________________________________ >>>> >>>> Note: This email is for the confidential use of the named >>>>addressee(s) only and may contain proprietary, confidential or >>>>privileged information. >>>> If you are not the intended recipient, you are hereby notified that >>>>any review, dissemination or copying of this email is strictly >>>>prohibited, and to please notify the sender immediately and destroy >>>>this email and any attachments. Email transmission cannot be >>>>guaranteed to be secure or error-free. The Company, therefore, does >>>>not make any guarantees as to the completeness or accuracy of this >>>>email or any attachments. This email is for informational purposes >>>>only and does not constitute a recommendation, offer, request or >>>>solicitation of any kind to buy, sell, subscribe, redeem or perform >>>>any type of transaction of a financial product. >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> >>>>https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpf >>>>sug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hea >>>>rns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944eb2a >>>>39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Ytqc6 >>>>pw%3D&reserved=0 _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> >>>>https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpf >>>>sug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hea >>>>rns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944eb2a >>>>39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Ytqc6 >>>>pw%3D&reserved=0 >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpf >>> sug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hea >>> rns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944eb2a >>> 39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Ytqc6 >>> pw%3D&reserved=0 >> > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.o >rg%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hearns%40asml >.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944eb2a39d93e96cad61fc >%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Ytqc6pw%3D&reserved=0 >-- The information contained in this communication and any attachments is >confidential and may be privileged, and is for the sole use of the >intended recipient(s). Any unauthorized review, use, disclosure or >distribution is prohibited. Unless explicitly stated otherwise in the >body of this communication or the attachment thereto (if any), the >information is provided on an AS-IS basis without any express or implied >warranties or liabilities. To the extent you are relying on this >information, you are doing so at your own risk. If you are not the >intended recipient, please notify the sender immediately by replying to >this message and destroy all copies of this message and any attachments. >Neither the sender nor the company/group of companies he or she >represents shall be liable for the proper and complete transmission of >the information contained in this communication, or for any delay in its >receipt. >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss From scale at us.ibm.com Fri Sep 29 13:26:51 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Fri, 29 Sep 2017 07:26:51 -0500 Subject: [gpfsug-discuss] Latest recommended 4.2 efix? In-Reply-To: References: Message-ID: There isn't a "recommended" efix as such. Generally, fixes go into the next ptf so that they go through a test cycle. If a customer hits a serious issue that cannot wait for the next ptf, they can request an efix be built, but since efixes do not get the same level of rigorous testing as a ptf, they are not generally recommended unless you report an issue and service determines you need it. To address your other questions: We are currently up to efix3 on 4.2.3.4 We don't announce PTF dates, because they depend upon the testing; however, you can see that we generally release a PTF roughly every 6 weeks and I believe ptf4 was out on 8/24 Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" To: "discussion, gpfsug main" Date: 09/28/2017 08:59 PM Subject: [gpfsug-discuss] Latest recommended 4.2 efix? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Everyone, What?s the latest recommended efix release for 4.2.3.4? I?m working on testing a 4.1 to 4.2 migration and was reminded today of some fun bugs in 4.2.3.4 for which I think there are efixes. Alternatively, any word on a 4.2.3.5 release date? -Aaron _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=IVcYH9EDg-UaA4Jt2GbsxN5XN1XbvejXTX0gAzNxtpM&s=9SmogyyA6QNSWxlZrpE-vBbslts0UexwJwPzp78LgKs&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From sandeep.patil at in.ibm.com Sat Sep 30 05:02:22 2017 From: sandeep.patil at in.ibm.com (Sandeep Ramesh) Date: Sat, 30 Sep 2017 09:32:22 +0530 Subject: [gpfsug-discuss] Spectrum Scale Enablement Material - 1H 2017 Message-ID: Hi Folks I was asked by Doris Conti to send the below to our Spectrum Scale User group. Below is a consolidated link that list all the enablement on Spectrum Scale/ESS that was done in 1H 2017 - which have blogs and videos from development and offering management. https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media Do note, Spectrum Scale developers keep blogging on the below site which is worth bookmarking: https://developer.ibm.com/storage/blog/ (as recent as 4 new blogs in Sept) Thanks Sandeep Linkedin: https://www.linkedin.com/in/sandeeprpatil Spectrum Scale Dev. -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Fri Sep 1 09:45:24 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Fri, 1 Sep 2017 08:45:24 +0000 Subject: [gpfsug-discuss] GPFS GUI Nodes > NSD no data Message-ID: For some time now if I go into the GUI, select Monitoring > Nodes > NSD Server Nodes, the only columns with good data are Name, State and NSD Count. Everything else e.g. Avg Disk Wait Read is listed "N/A". Is this another config option I need to enable? It's been bugging me for a while, I don't think I've seen it work since 4.2.1 which was the first time I saw the GUI. Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From bart.vandamme at sdnsquare.com Fri Sep 1 10:30:59 2017 From: bart.vandamme at sdnsquare.com (Bart Van Damme) Date: Fri, 1 Sep 2017 11:30:59 +0200 Subject: [gpfsug-discuss] SMB2 leases - oplocks - growing files Message-ID: We are a company located in Belgium that mainly implements spectrum scale clusters in the Media and broadcasting industry. Currently we have a customer who wants to export the scale file system over samba 4.5 and 4.6. In these versions the SMB2 leases are activated by default for enhancing the oplocks system. The problem is when this option is not disabled Adobe (and probably Windows) is not notified the size of the file have changed, resulting that reading growing file in Adobe is not working, the timeline is not updated. Does anybody had this issues before and know how to solve it. This is the smb.conf file: ============================ # Global options smb2 leases = yes client use spnego = yes clustering = yes unix extensions = no mangled names = no ea support = yes store dos attributes = yes map readonly = no map archive = yes map system = no force unknown acl user = yes obey pam restrictions = no deadtime = 480 disable netbios = yes server signing = disabled server min protocol = SMB2 smb encrypt = off # We do not allow guest usage. guest ok = no guest account = nobody map to guest = bad user # disable printing load printers = no printing = bsd printcap name = /dev/null disable spoolss = yes # log settings log file = /var/log/samba/log.%m # max 500KB per log file, then rotate max log size = 500 log level = 1 passdb:1 auth:1 winbind:1 idmap:1 #============ Share Definitions ============ [pfs] comment = GPFS path = /gpfs/pfs valid users = @ug_numpr writeable = yes inherit permissions = yes create mask = 664 force create mode = 664 nfs4:chown = yes nfs4:acedup = merge nfs4:mode = special fileid:algorithm = fsname vfs objects = shadow_copy2 gpfs fileid full_audit full_audit:prefix = %u|%I|%m|%S full_audit:success = rename unlink rmdir full_audit:failure = none full_audit:facility = local6 full_audit:priority = NOTICE shadow:fixinodes = yes gpfs:sharemodes = yes gpfs:winattr = yes gpfs:leases = no locking = yes posix locking = yes oplocks = yes kernel oplocks = no Grtz, Bart *Bart Van Damme * *Customer Project Manager* *SDNsquare* Technologiepark 3, 9052 Zwijnaarde, Belgium www.sdnsquare.com T: + 32 9 241 56 01 <09%20241%2056%2001> M: + 32 496 59 23 09 *This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email.* Virusvrij. www.avast.com <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Fri Sep 1 14:36:56 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Fri, 1 Sep 2017 13:36:56 +0000 Subject: [gpfsug-discuss] GPFS GUI Nodes > NSD no data In-Reply-To: References: Message-ID: Resolved this, guessed at changing GPFSNSDDisk.period to 5. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 01 September 2017 09:45 To: 'gpfsug-discuss at spectrumscale.org' Subject: [gpfsug-discuss] GPFS GUI Nodes > NSD no data For some time now if I go into the GUI, select Monitoring > Nodes > NSD Server Nodes, the only columns with good data are Name, State and NSD Count. Everything else e.g. Avg Disk Wait Read is listed "N/A". Is this another config option I need to enable? It's been bugging me for a while, I don't think I've seen it work since 4.2.1 which was the first time I saw the GUI. Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Fri Sep 1 21:56:25 2017 From: ewahl at osc.edu (Edward Wahl) Date: Fri, 1 Sep 2017 16:56:25 -0400 Subject: [gpfsug-discuss] Change to default for verbsRdmaMinBytes? Message-ID: <20170901165625.6e4edd4c@osc.edu> Howdy. Just noticed this change to min RDMA packet size and I don't seem to see it in any patch notes. Maybe I just skipped the one where this changed? mmlsconfig verbsRdmaMinBytes verbsRdmaMinBytes 16384 (in case someone thinks we changed it) [root at proj-nsd01 ~]# mmlsconfig |grep verbs verbsRdma enable verbsRdma disable verbsRdmasPerConnection 14 verbsRdmasPerNode 1024 verbsPorts mlx5_3/1 verbsPorts mlx4_0 verbsPorts mlx5_0 verbsPorts mlx5_0 mlx5_1 verbsPorts mlx4_1/1 verbsPorts mlx4_1/2 Oddly I also see this in config, though I've seen these kinds of things before. mmdiag --config |grep verbsRdmaMinBytes verbsRdmaMinBytes 8192 We're on a recent efix. Current GPFS build: "4.2.2.3 efix21 (1028007)". -- Ed Wahl Ohio Supercomputer Center 614-292-9302 From akers at vt.edu Fri Sep 1 22:06:15 2017 From: akers at vt.edu (Joshua Akers) Date: Fri, 01 Sep 2017 21:06:15 +0000 Subject: [gpfsug-discuss] Quorum managers Message-ID: Hi all, I was wondering how most people set up quorum managers. We historically had physical admin nodes be the quorum managers, but are switching to a virtualized admin services infrastructure. We have been choosing a few compute nodes to act as quorum managers in our client clusters, but have considered using virtual machines instead. Has anyone else done this? Regards, Josh -- *Joshua D. Akers* *HPC Team Lead* NI&S Systems Support (MC0214) 1700 Pratt Drive Blacksburg, VA 24061 540-231-9506 -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Fri Sep 1 23:42:55 2017 From: oehmes at gmail.com (Sven Oehme) Date: Fri, 01 Sep 2017 22:42:55 +0000 Subject: [gpfsug-discuss] Change to default for verbsRdmaMinBytes? In-Reply-To: <20170901165625.6e4edd4c@osc.edu> References: <20170901165625.6e4edd4c@osc.edu> Message-ID: Hi Ed, yes the defaults for that have changed for customers who had not overridden the default settings. the reason we did this was that many systems in the field including all ESS systems that come pre-tuned where manually changed to 8k from the 16k default due to better performance that was confirmed in multiple customer engagements and tests with various settings , therefore we change the default to what it should be in the field so people are not bothered to set it anymore (simplification) or get benefits by changing the default to provides better performance. all this happened when we did the communication code overhaul that did lead to significant (think factors) of improved RPC performance for RDMA and VERBS workloads. there is another round of significant enhancements coming soon , that will make even more parameters either obsolete or change some of the defaults for better out of the box performance. i see that we should probably enhance the communication of this changes, not that i think this will have any negative effect compared to what your performance was with the old setting i am actually pretty confident that you get better performance with the new code, but by setting parameters back to default on most 'manual tuned' probably makes your system even faster. if you have a Scale Client on 4.2.3+ you really shouldn't have anything set beside maxfilestocache, pagepool, workerthreads and potential prefetch , if you are a protocol node, this and settings specific to an export (e.g. SMB, NFS set some special settings) , pretty much everything else these days should be set to default so the code can pick the correct parameters., if its not and you get better performance by manual tweaking something i like to hear about it. on the communication side in the next release will eliminate another set of parameters that are now 'auto set' and we plan to work on NSD next. i presented various slides about the communication and simplicity changes in various forums, latest public non NDA slides i presented are here --> http://files.gpfsug.org/presentations/2017/Manchester/08_Research_Topics.pdf hope this helps . Sven On Fri, Sep 1, 2017 at 1:56 PM Edward Wahl wrote: > Howdy. Just noticed this change to min RDMA packet size and I don't seem > to > see it in any patch notes. Maybe I just skipped the one where this > changed? > > mmlsconfig verbsRdmaMinBytes > verbsRdmaMinBytes 16384 > > (in case someone thinks we changed it) > > [root at proj-nsd01 ~]# mmlsconfig |grep verbs > verbsRdma enable > verbsRdma disable > verbsRdmasPerConnection 14 > verbsRdmasPerNode 1024 > verbsPorts mlx5_3/1 > verbsPorts mlx4_0 > verbsPorts mlx5_0 > verbsPorts mlx5_0 mlx5_1 > verbsPorts mlx4_1/1 > verbsPorts mlx4_1/2 > > > Oddly I also see this in config, though I've seen these kinds of things > before. > mmdiag --config |grep verbsRdmaMinBytes > verbsRdmaMinBytes 8192 > > We're on a recent efix. > Current GPFS build: "4.2.2.3 efix21 (1028007)". > > -- > > Ed Wahl > Ohio Supercomputer Center > 614-292-9302 <(614)%20292-9302> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From truongv at us.ibm.com Fri Sep 1 23:56:23 2017 From: truongv at us.ibm.com (Truong Vu) Date: Fri, 1 Sep 2017 18:56:23 -0400 Subject: [gpfsug-discuss] Change to default for verbsRdmaMinBytes? In-Reply-To: References: Message-ID: The discrepancy between the mmlsconfig view and mmdiag has been fixed in GFPS 4.2.3 version. Note, mmdiag reports the correct default value. Tru. From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 09/01/2017 06:43 PM Subject: gpfsug-discuss Digest, Vol 68, Issue 2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=yK4FkYvJ21ubvurR6W1Pi3qvNw9ydj2XP0ghXPc7DUw&s=xZHUN9ZlFjvgBmBB8wnX2cQDQQV42R_q-xHubNA3JBM&e= or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: GPFS GUI Nodes > NSD no data (Sobey, Richard A) 2. Change to default for verbsRdmaMinBytes? (Edward Wahl) 3. Quorum managers (Joshua Akers) 4. Re: Change to default for verbsRdmaMinBytes? (Sven Oehme) ---------------------------------------------------------------------- Message: 1 Date: Fri, 1 Sep 2017 13:36:56 +0000 From: "Sobey, Richard A" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS GUI Nodes > NSD no data Message-ID: Content-Type: text/plain; charset="us-ascii" Resolved this, guessed at changing GPFSNSDDisk.period to 5. From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 01 September 2017 09:45 To: 'gpfsug-discuss at spectrumscale.org' Subject: [gpfsug-discuss] GPFS GUI Nodes > NSD no data For some time now if I go into the GUI, select Monitoring > Nodes > NSD Server Nodes, the only columns with good data are Name, State and NSD Count. Everything else e.g. Avg Disk Wait Read is listed "N/A". Is this another config option I need to enable? It's been bugging me for a while, I don't think I've seen it work since 4.2.1 which was the first time I saw the GUI. Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: < https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_pipermail_gpfsug-2Ddiscuss_attachments_20170901_2a4162e9_attachment-2D0001.html&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=yK4FkYvJ21ubvurR6W1Pi3qvNw9ydj2XP0ghXPc7DUw&s=jcPGl5zwtQFMbnEmBpNErsD43uwoVeKgKk_8j7ZeCJY&e= > ------------------------------ Message: 2 Date: Fri, 1 Sep 2017 16:56:25 -0400 From: Edward Wahl To: gpfsug main discussion list Subject: [gpfsug-discuss] Change to default for verbsRdmaMinBytes? Message-ID: <20170901165625.6e4edd4c at osc.edu> Content-Type: text/plain; charset="US-ASCII" Howdy. Just noticed this change to min RDMA packet size and I don't seem to see it in any patch notes. Maybe I just skipped the one where this changed? mmlsconfig verbsRdmaMinBytes verbsRdmaMinBytes 16384 (in case someone thinks we changed it) [root at proj-nsd01 ~]# mmlsconfig |grep verbs verbsRdma enable verbsRdma disable verbsRdmasPerConnection 14 verbsRdmasPerNode 1024 verbsPorts mlx5_3/1 verbsPorts mlx4_0 verbsPorts mlx5_0 verbsPorts mlx5_0 mlx5_1 verbsPorts mlx4_1/1 verbsPorts mlx4_1/2 Oddly I also see this in config, though I've seen these kinds of things before. mmdiag --config |grep verbsRdmaMinBytes verbsRdmaMinBytes 8192 We're on a recent efix. Current GPFS build: "4.2.2.3 efix21 (1028007)". -- Ed Wahl Ohio Supercomputer Center 614-292-9302 ------------------------------ Message: 3 Date: Fri, 01 Sep 2017 21:06:15 +0000 From: Joshua Akers To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Quorum managers Message-ID: Content-Type: text/plain; charset="utf-8" Hi all, I was wondering how most people set up quorum managers. We historically had physical admin nodes be the quorum managers, but are switching to a virtualized admin services infrastructure. We have been choosing a few compute nodes to act as quorum managers in our client clusters, but have considered using virtual machines instead. Has anyone else done this? Regards, Josh -- *Joshua D. Akers* *HPC Team Lead* NI&S Systems Support (MC0214) 1700 Pratt Drive Blacksburg, VA 24061 540-231-9506 -------------- next part -------------- An HTML attachment was scrubbed... URL: < https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_pipermail_gpfsug-2Ddiscuss_attachments_20170901_a49947db_attachment-2D0001.html&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=yK4FkYvJ21ubvurR6W1Pi3qvNw9ydj2XP0ghXPc7DUw&s=Gag0raQbp7KZAyINlnmuxlnpjboo9XOWO3dDL2HCsZo&e= > ------------------------------ Message: 4 Date: Fri, 01 Sep 2017 22:42:55 +0000 From: Sven Oehme To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Change to default for verbsRdmaMinBytes? Message-ID: Content-Type: text/plain; charset="utf-8" Hi Ed, yes the defaults for that have changed for customers who had not overridden the default settings. the reason we did this was that many systems in the field including all ESS systems that come pre-tuned where manually changed to 8k from the 16k default due to better performance that was confirmed in multiple customer engagements and tests with various settings , therefore we change the default to what it should be in the field so people are not bothered to set it anymore (simplification) or get benefits by changing the default to provides better performance. all this happened when we did the communication code overhaul that did lead to significant (think factors) of improved RPC performance for RDMA and VERBS workloads. there is another round of significant enhancements coming soon , that will make even more parameters either obsolete or change some of the defaults for better out of the box performance. i see that we should probably enhance the communication of this changes, not that i think this will have any negative effect compared to what your performance was with the old setting i am actually pretty confident that you get better performance with the new code, but by setting parameters back to default on most 'manual tuned' probably makes your system even faster. if you have a Scale Client on 4.2.3+ you really shouldn't have anything set beside maxfilestocache, pagepool, workerthreads and potential prefetch , if you are a protocol node, this and settings specific to an export (e.g. SMB, NFS set some special settings) , pretty much everything else these days should be set to default so the code can pick the correct parameters., if its not and you get better performance by manual tweaking something i like to hear about it. on the communication side in the next release will eliminate another set of parameters that are now 'auto set' and we plan to work on NSD next. i presented various slides about the communication and simplicity changes in various forums, latest public non NDA slides i presented are here --> https://urldefense.proofpoint.com/v2/url?u=http-3A__files.gpfsug.org_presentations_2017_Manchester_08-5FResearch-5FTopics.pdf&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=yK4FkYvJ21ubvurR6W1Pi3qvNw9ydj2XP0ghXPc7DUw&s=8c_55Ld_iAC2sr_QU0cyGiOiyU7Z9NjcVknVuRpRIlk&e= hope this helps . Sven On Fri, Sep 1, 2017 at 1:56 PM Edward Wahl wrote: > Howdy. Just noticed this change to min RDMA packet size and I don't seem > to > see it in any patch notes. Maybe I just skipped the one where this > changed? > > mmlsconfig verbsRdmaMinBytes > verbsRdmaMinBytes 16384 > > (in case someone thinks we changed it) > > [root at proj-nsd01 ~]# mmlsconfig |grep verbs > verbsRdma enable > verbsRdma disable > verbsRdmasPerConnection 14 > verbsRdmasPerNode 1024 > verbsPorts mlx5_3/1 > verbsPorts mlx4_0 > verbsPorts mlx5_0 > verbsPorts mlx5_0 mlx5_1 > verbsPorts mlx4_1/1 > verbsPorts mlx4_1/2 > > > Oddly I also see this in config, though I've seen these kinds of things > before. > mmdiag --config |grep verbsRdmaMinBytes > verbsRdmaMinBytes 8192 > > We're on a recent efix. > Current GPFS build: "4.2.2.3 efix21 (1028007)". > > -- > > Ed Wahl > Ohio Supercomputer Center > 614-292-9302 <(614)%20292-9302> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=yK4FkYvJ21ubvurR6W1Pi3qvNw9ydj2XP0ghXPc7DUw&s=xZHUN9ZlFjvgBmBB8wnX2cQDQQV42R_q-xHubNA3JBM&e= > -------------- next part -------------- An HTML attachment was scrubbed... URL: < https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_pipermail_gpfsug-2Ddiscuss_attachments_20170901_b75cfc74_attachment.html&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=yK4FkYvJ21ubvurR6W1Pi3qvNw9ydj2XP0ghXPc7DUw&s=LpVpXMgqE_LD-t_J7yfNwURUrdUR29TzWvjVTi18kpA&e= > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=yK4FkYvJ21ubvurR6W1Pi3qvNw9ydj2XP0ghXPc7DUw&s=xZHUN9ZlFjvgBmBB8wnX2cQDQQV42R_q-xHubNA3JBM&e= End of gpfsug-discuss Digest, Vol 68, Issue 2 ********************************************* -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From r.sobey at imperial.ac.uk Sat Sep 2 10:35:34 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Sat, 2 Sep 2017 09:35:34 +0000 Subject: [gpfsug-discuss] Date formats inconsistent mmfs.log Message-ID: Is there a good reason for the date formats in mmfs.log to be inconsistent? Apart from my OCD getting the better of me, it makes log analysis a bit difficult. Sat Sep 2 10:33:42.145 2017: [I] Command: successful mount gpfs Sat 2 Sep 10:33:42 BST 2017: finished mounting /dev/gpfs Sat Sep 2 10:33:42.168 2017: [I] Calling user exit script mmSysMonGpfsStartup: event startup, Async command /usr/lpp/mmfs/bin/mmsysmoncontrol. Sat Sep 2 10:33:42.190 2017: [I] Calling user exit script mmSinceShutdownRoleChange: event startup, Async command /usr/lpp/mmfs/bin/mmsysmonc. Sat 2 Sep 10:33:42 BST 2017: [I] sendRasEventToMonitor: Successfully send a filesystem event to monitor Sat 2 Sep 10:33:42 BST 2017: [I] The Spectrum Scale monitoring service is already running. Pid=5134 Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From truongv at us.ibm.com Sat Sep 2 12:40:15 2017 From: truongv at us.ibm.com (Truong Vu) Date: Sat, 2 Sep 2017 07:40:15 -0400 Subject: [gpfsug-discuss] Date formats inconsistent mmfs.log In-Reply-To: References: Message-ID: The dates that have the zone abbreviation are from the scripts which use the OS date command. The daemon has its own format. This inconsistency has been address in 4.2.2. From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 09/02/2017 07:00 AM Subject: gpfsug-discuss Digest, Vol 68, Issue 4 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=wiPE5K_0qzTwdloCshNcSyamVNRJKz5WyOBal7dMz8w&s=pd3-zi8UQxVOjxOYxqbuaFSvv_71WENUBJsw0KUV3ro&e= or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Date formats inconsistent mmfs.log (Sobey, Richard A) ---------------------------------------------------------------------- Message: 1 Date: Sat, 2 Sep 2017 09:35:34 +0000 From: "Sobey, Richard A" To: "'gpfsug-discuss at spectrumscale.org'" Subject: [gpfsug-discuss] Date formats inconsistent mmfs.log Message-ID: Content-Type: text/plain; charset="us-ascii" Is there a good reason for the date formats in mmfs.log to be inconsistent? Apart from my OCD getting the better of me, it makes log analysis a bit difficult. Sat Sep 2 10:33:42.145 2017: [I] Command: successful mount gpfs Sat 2 Sep 10:33:42 BST 2017: finished mounting /dev/gpfs Sat Sep 2 10:33:42.168 2017: [I] Calling user exit script mmSysMonGpfsStartup: event startup, Async command /usr/lpp/mmfs/bin/mmsysmoncontrol. Sat Sep 2 10:33:42.190 2017: [I] Calling user exit script mmSinceShutdownRoleChange: event startup, Async command /usr/lpp/mmfs/bin/mmsysmonc. Sat 2 Sep 10:33:42 BST 2017: [I] sendRasEventToMonitor: Successfully send a filesystem event to monitor Sat 2 Sep 10:33:42 BST 2017: [I] The Spectrum Scale monitoring service is already running. Pid=5134 Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: < https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_pipermail_gpfsug-2Ddiscuss_attachments_20170902_4f65f336_attachment-2D0001.html&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=wiPE5K_0qzTwdloCshNcSyamVNRJKz5WyOBal7dMz8w&s=fNT71mM8obJ9rwxzm3Uzxw4mayi2pQg1u950E1raYK4&e= > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=wiPE5K_0qzTwdloCshNcSyamVNRJKz5WyOBal7dMz8w&s=pd3-zi8UQxVOjxOYxqbuaFSvv_71WENUBJsw0KUV3ro&e= End of gpfsug-discuss Digest, Vol 68, Issue 4 ********************************************* -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From john.hearns at asml.com Mon Sep 4 08:43:59 2017 From: john.hearns at asml.com (John Hearns) Date: Mon, 4 Sep 2017 07:43:59 +0000 Subject: [gpfsug-discuss] Date formats inconsistent mmfs.log In-Reply-To: References: Message-ID: Richard, The date format changed at an update level. We recently updated to 4.2.3 and when you run mmchconfig release=LATEST you are prompted to confirm that the new log format can be used. I guess you might not have cut all nodes over yet on your update over the weekend? Cut and paste from the documentation: mmfsLogTimeStampISO8601={yes | no} Setting this parameter to no allows the cluster to continue running with the earlier log time stamp format. For more information, see Security mode. * Set mmfsLogTimeStampISO8061 to no if you save log information and you are not yet ready to switch to the new log time stamp format. After you complete the migration, you can change the log time stamp format at any time with the mmchconfig command. * Omit this parameter if you are ready to switch to the new format. The default value is yes From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: Saturday, September 02, 2017 11:36 AM To: 'gpfsug-discuss at spectrumscale.org' Subject: [gpfsug-discuss] Date formats inconsistent mmfs.log Is there a good reason for the date formats in mmfs.log to be inconsistent? Apart from my OCD getting the better of me, it makes log analysis a bit difficult. Sat Sep 2 10:33:42.145 2017: [I] Command: successful mount gpfs Sat 2 Sep 10:33:42 BST 2017: finished mounting /dev/gpfs Sat Sep 2 10:33:42.168 2017: [I] Calling user exit script mmSysMonGpfsStartup: event startup, Async command /usr/lpp/mmfs/bin/mmsysmoncontrol. Sat Sep 2 10:33:42.190 2017: [I] Calling user exit script mmSinceShutdownRoleChange: event startup, Async command /usr/lpp/mmfs/bin/mmsysmonc. Sat 2 Sep 10:33:42 BST 2017: [I] sendRasEventToMonitor: Successfully send a filesystem event to monitor Sat 2 Sep 10:33:42 BST 2017: [I] The Spectrum Scale monitoring service is already running. Pid=5134 Cheers Richard -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Mon Sep 4 09:05:10 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Mon, 4 Sep 2017 08:05:10 +0000 Subject: [gpfsug-discuss] Date formats inconsistent mmfs.log In-Reply-To: References: , Message-ID: Ah. I'm running 4.2.3 but haven't changed the release level. I'll get that sorted out. Thanks for the replies! Get Outlook for Android ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of John Hearns Sent: Monday, September 4, 2017 8:43:59 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Date formats inconsistent mmfs.log Richard, The date format changed at an update level. We recently updated to 4.2.3 and when you run mmchconfig release=LATEST you are prompted to confirm that the new log format can be used. I guess you might not have cut all nodes over yet on your update over the weekend? Cut and paste from the documentation: mmfsLogTimeStampISO8601={yes | no} Setting this parameter to no allows the cluster to continue running with the earlier log time stamp format. For more information, see Security mode. ? Set mmfsLogTimeStampISO8061 to no if you save log information and you are not yet ready to switch to the new log time stamp format. After you complete the migration, you can change the log time stamp format at any time with the mmchconfig command. ? Omit this parameter if you are ready to switch to the new format. The default value is yes From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: Saturday, September 02, 2017 11:36 AM To: 'gpfsug-discuss at spectrumscale.org' Subject: [gpfsug-discuss] Date formats inconsistent mmfs.log Is there a good reason for the date formats in mmfs.log to be inconsistent? Apart from my OCD getting the better of me, it makes log analysis a bit difficult. Sat Sep 2 10:33:42.145 2017: [I] Command: successful mount gpfs Sat 2 Sep 10:33:42 BST 2017: finished mounting /dev/gpfs Sat Sep 2 10:33:42.168 2017: [I] Calling user exit script mmSysMonGpfsStartup: event startup, Async command /usr/lpp/mmfs/bin/mmsysmoncontrol. Sat Sep 2 10:33:42.190 2017: [I] Calling user exit script mmSinceShutdownRoleChange: event startup, Async command /usr/lpp/mmfs/bin/mmsysmonc. Sat 2 Sep 10:33:42 BST 2017: [I] sendRasEventToMonitor: Successfully send a filesystem event to monitor Sat 2 Sep 10:33:42 BST 2017: [I] The Spectrum Scale monitoring service is already running. Pid=5134 Cheers Richard -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ckrafft at de.ibm.com Mon Sep 4 13:02:49 2017 From: ckrafft at de.ibm.com (Christoph Krafft) Date: Mon, 4 Sep 2017 12:02:49 +0000 Subject: [gpfsug-discuss] Looking for Use-Cases with Spectrum Scale / ESS with vRanger & VMware Message-ID: An HTML attachment was scrubbed... URL: From heiner.billich at psi.ch Mon Sep 4 17:48:20 2017 From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI)) Date: Mon, 4 Sep 2017 16:48:20 +0000 Subject: [gpfsug-discuss] Use AFM for migration of many small files Message-ID: <467FB293-D33B-46F4-BA1B-A5CB4D28DDE6@psi.ch> Hello, We use AFM prefetch to migrate data between two clusters (using NFS). This works fine with large files, say 1+GB. But we have millions of smaller files, about 1MB each. Here I see just ~150MB/s ? compare this to the 1000+MB/s we get for larger files. I assume that we would need more parallelism, does prefetch pull just one file at a time? So each file needs some or many metadata operations plus a single or just a few read and writes. Doing this sequentially adds up all the latencies of NFS+GPFS. This is my explanation. With larger files gpfs prefetch on home will help. Please can anybody comment: Is this right, does AFM prefetch handle one file at a time in a sequential manner? And is there any way to change this behavior? Or am I wrong and I need to look elsewhere to get better performance for prefetch of many smaller files? We will migrate several filesets in parallel, but still with individual filesets up to 350TB in size 150MB/s isn?t fun. Also just about 150 files/s seconds looks poor. The setup is quite new, hence there may be other places to look at. It?s all RHEL7 an spectrum scale 4.2.2-3 on the afm cache. Thank you, Heiner --, Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch From vpuvvada at in.ibm.com Tue Sep 5 15:27:21 2017 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Tue, 5 Sep 2017 19:57:21 +0530 Subject: [gpfsug-discuss] Use AFM for migration of many small files In-Reply-To: <467FB293-D33B-46F4-BA1B-A5CB4D28DDE6@psi.ch> References: <467FB293-D33B-46F4-BA1B-A5CB4D28DDE6@psi.ch> Message-ID: Which version of Spectrum Scale ? What is the fileset mode ? >We use AFM prefetch to migrate data between two clusters (using NFS). This works fine with large files, say 1+GB. But we have millions of smaller files, about 1MB each. Here >I see just ~150MB/s ? compare this to the 1000+MB/s we get for larger files. How was the performance measured ? If parallel IO is enabled, AFM uses multiple gateway nodes to prefetch the large files (if file size if more than 1GB). Performance difference between small and lager file is huge (1000MB - 150MB = 850MB) here, and generally it is not the case. How many files were present in list file for prefetch ? Could you also share full internaldump from the gateway node ? >I assume that we would need more parallelism, does prefetch pull just one file at a time? So each file needs some or many metadata operations plus a single or just a few >read and writes. Doing this sequentially adds up all the latencies of NFS+GPFS. This is my explanation. With larger files gpfs prefetch on home will help. AFM prefetches the files on multiple threads. Default flush threads for prefetch are 36 (fileset.afmNumFlushThreads (default 4) + afmNumIOFlushThreads (default 32)). >Please can anybody comment: Is this right, does AFM prefetch handle one file at a time in a sequential manner? And is there any way to change this behavior? Or am I wrong and >I need to look elsewhere to get better performance for prefetch of many smaller files? See above, AFM reads files on multiple threads parallelly. Try increasing the afmNumFlushThreads on fileset and verify if it improves the performance. ~Venkat (vpuvvada at in.ibm.com) From: "Billich Heinrich Rainer (PSI)" To: "gpfsug-discuss at spectrumscale.org" Date: 09/04/2017 10:18 PM Subject: [gpfsug-discuss] Use AFM for migration of many small files Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello, We use AFM prefetch to migrate data between two clusters (using NFS). This works fine with large files, say 1+GB. But we have millions of smaller files, about 1MB each. Here I see just ~150MB/s ? compare this to the 1000+MB/s we get for larger files. I assume that we would need more parallelism, does prefetch pull just one file at a time? So each file needs some or many metadata operations plus a single or just a few read and writes. Doing this sequentially adds up all the latencies of NFS+GPFS. This is my explanation. With larger files gpfs prefetch on home will help. Please can anybody comment: Is this right, does AFM prefetch handle one file at a time in a sequential manner? And is there any way to change this behavior? Or am I wrong and I need to look elsewhere to get better performance for prefetch of many smaller files? We will migrate several filesets in parallel, but still with individual filesets up to 350TB in size 150MB/s isn?t fun. Also just about 150 files/s seconds looks poor. The setup is quite new, hence there may be other places to look at. It?s all RHEL7 an spectrum scale 4.2.2-3 on the afm cache. Thank you, Heiner --, Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://urldefense.proofpoint.com/v2/url?u=https-3A__www.psi.ch&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=4y79Y-3M5sHV1Fm6aUFPEDIl8W5jxVP64XSlBsAYBb4&s=eHcVdovN10-m-Qk0Ln2qvol3pkKNFwrzz2wgf1zXVXE&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=4y79Y-3M5sHV1Fm6aUFPEDIl8W5jxVP64XSlBsAYBb4&s=LbRyuSM_djs0FDXr27hPottQHAn3OGcivpyRcIDBN3U&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From kenneth.waegeman at ugent.be Wed Sep 6 12:55:20 2017 From: kenneth.waegeman at ugent.be (Kenneth Waegeman) Date: Wed, 6 Sep 2017 13:55:20 +0200 Subject: [gpfsug-discuss] Change to default for verbsRdmaMinBytes? In-Reply-To: References: <20170901165625.6e4edd4c@osc.edu> Message-ID: Hi Sven, I see two parameters that we have set to non-default values that are not in your list of options still to configure. verbsRdmasPerConnection (256) and socketMaxListenConnections (1024) I remember we had to set socketMaxListenConnections because our cluster consist of +550 nodes. Are these settings still needed, or is this also tackled in the code? Thank you!! Cheers, Kenneth On 02/09/17 00:42, Sven Oehme wrote: > Hi Ed, > > yes the defaults for that have changed for customers who had not > overridden the default settings. the reason we did this was that many > systems in the field including all ESS systems that come pre-tuned > where manually changed to 8k from the 16k default due to better > performance that was confirmed in multiple customer engagements and > tests with various settings , therefore we change the default to what > it should be in the field so people are not bothered to set it anymore > (simplification) or get benefits by changing the default to provides > better performance. > all this happened when we did the communication code overhaul that did > lead to significant (think factors) of improved RPC performance for > RDMA and VERBS workloads. > there is another round of significant enhancements coming soon , that > will make even more parameters either obsolete or change some of the > defaults for better out of the box performance. > i see that we should probably enhance the communication of this > changes, not that i think this will have any negative effect compared > to what your performance was with the old setting i am actually pretty > confident that you get better performance with the new code, but by > setting parameters back to default on most 'manual tuned' probably > makes your system even faster. > if you have a Scale Client on 4.2.3+ you really shouldn't have > anything set beside maxfilestocache, pagepool, workerthreads and > potential prefetch , if you are a protocol node, this and settings > specific to an export (e.g. SMB, NFS set some special settings) , > pretty much everything else these days should be set to default so the > code can pick the correct parameters., if its not and you get better > performance by manual tweaking something i like to hear about it. > on the communication side in the next release will eliminate another > set of parameters that are now 'auto set' and we plan to work on NSD > next. > i presented various slides about the communication and simplicity > changes in various forums, latest public non NDA slides i presented > are here --> > http://files.gpfsug.org/presentations/2017/Manchester/08_Research_Topics.pdf > > hope this helps . > > Sven > > > > On Fri, Sep 1, 2017 at 1:56 PM Edward Wahl > wrote: > > Howdy. Just noticed this change to min RDMA packet size and I > don't seem to > see it in any patch notes. Maybe I just skipped the one where > this changed? > > mmlsconfig verbsRdmaMinBytes > verbsRdmaMinBytes 16384 > > (in case someone thinks we changed it) > > [root at proj-nsd01 ~]# mmlsconfig |grep verbs > verbsRdma enable > verbsRdma disable > verbsRdmasPerConnection 14 > verbsRdmasPerNode 1024 > verbsPorts mlx5_3/1 > verbsPorts mlx4_0 > verbsPorts mlx5_0 > verbsPorts mlx5_0 mlx5_1 > verbsPorts mlx4_1/1 > verbsPorts mlx4_1/2 > > > Oddly I also see this in config, though I've seen these kinds of > things before. > mmdiag --config |grep verbsRdmaMinBytes > verbsRdmaMinBytes 8192 > > We're on a recent efix. > Current GPFS build: "4.2.2.3 efix21 (1028007)". > > -- > > Ed Wahl > Ohio Supercomputer Center > 614-292-9302 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Wed Sep 6 13:22:41 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Wed, 6 Sep 2017 14:22:41 +0200 Subject: [gpfsug-discuss] Change to default for verbsRdmaMinBytes? In-Reply-To: References: <20170901165625.6e4edd4c@osc.edu> Message-ID: An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Sep 6 13:29:44 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 6 Sep 2017 12:29:44 +0000 Subject: [gpfsug-discuss] Save the date! GPFS-UG meeting at SC17 - Sunday November 12th Message-ID: <7838054B-8A46-46A0-8A53-81E3049B4AE7@nuance.com> The 2017 Supercomputing conference is only 2 months away, and here?s a reminder to come early and attend the GPFS user group meeting. The meeting is tentatively scheduled from the afternoon of Sunday, November 12th. Exact location and times are still being discussed. If you have an interest in presenting at the user group meeting, please let us know. More details in the coming weeks. Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From damir.krstic at gmail.com Wed Sep 6 13:35:45 2017 From: damir.krstic at gmail.com (Damir Krstic) Date: Wed, 06 Sep 2017 12:35:45 +0000 Subject: [gpfsug-discuss] filesets inside of filesets Message-ID: Today we have following fileset structure on our filesystem: /projects <-- gpfs filesystem /projects/b1000 <-- b1000 is a fileset with a fileset quota applied to it I need to create a fileset or a directory inside of this project and have separate quota applied to it e.g.: /projects/b1000 (b1000 has 10TB quota applied) /projects/b1000/backup (backup has 1TB quota applied) Is this possible? I am thinking nested filesets would work if GPFS supports that. Otherwise, I was going to create a separate filesystem, create corresponding backup filesets on it and symlink them to the /projects/ directory. Thanks in advance. Damir -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Sep 6 13:43:09 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Wed, 6 Sep 2017 12:43:09 +0000 Subject: [gpfsug-discuss] filesets inside of filesets In-Reply-To: References: Message-ID: Filesets in filesets are fine. BUT if you use scoped backups with TSM... Er Spectrum Protect, then there are restrictions on creating an IFS inside an IFS ... Simon From: > on behalf of "damir.krstic at gmail.com" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Wednesday, 6 September 2017 at 13:35 To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] filesets inside of filesets Today we have following fileset structure on our filesystem: /projects <-- gpfs filesystem /projects/b1000 <-- b1000 is a fileset with a fileset quota applied to it I need to create a fileset or a directory inside of this project and have separate quota applied to it e.g.: /projects/b1000 (b1000 has 10TB quota applied) /projects/b1000/backup (backup has 1TB quota applied) Is this possible? I am thinking nested filesets would work if GPFS supports that. Otherwise, I was going to create a separate filesystem, create corresponding backup filesets on it and symlink them to the /projects/ directory. Thanks in advance. Damir -------------- next part -------------- An HTML attachment was scrubbed... URL: From rohwedder at de.ibm.com Wed Sep 6 13:51:47 2017 From: rohwedder at de.ibm.com (Markus Rohwedder) Date: Wed, 6 Sep 2017 14:51:47 +0200 Subject: [gpfsug-discuss] filesets inside of filesets In-Reply-To: References: Message-ID: Hello Damir, the files that belong to your fileset "backup" has a separate quota, it is not related to the quota in "b1000". There is no cumulative quota. Fileset Nesting may need other considerations as well, in some cases filesets behave different than simple directories. -> For NFSV4 ACLs, inheritance stops at the fileset boundaries -> Snapshots include the independent parent and the dependent children. Nested independent filesets are not included in a fileset snapshot. -> Export protocols like NFS or SMB will cross fileset boundaries and just treat them like a directory. Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development Phone: +49 7034 6430190 IBM Deutschland E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Martina K?deritz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: Damir Krstic To: gpfsug main discussion list Date: 09/06/2017 02:36 PM Subject: [gpfsug-discuss] filesets inside of filesets Sent by: gpfsug-discuss-bounces at spectrumscale.org Today we have following fileset structure on our filesystem: /projects <-- gpfs filesystem /projects/b1000 <-- b1000 is a fileset with a fileset quota applied to it I need to create a fileset or a directory inside of this project and have separate quota applied to it e.g.: /projects/b1000 (b1000 has 10TB quota applied) /projects/b1000/backup (backup has 1TB quota applied) Is this possible? I am thinking nested filesets would work if GPFS supports that. Otherwise, I was going to create a separate filesystem, create corresponding backup filesets on it and symlink them to the /projects/ directory. Thanks in advance. Damir_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=5jyA3TazAAOckIeQUeIG0CJ4TG0aMWv7jDLDk3gYNkE&s=CbzPKTgh7mO6om2LTQr94LM1qfshrEdm58cJydejAfE&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1B378274.gif Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From oehmes at gmail.com Wed Sep 6 14:32:40 2017 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 06 Sep 2017 13:32:40 +0000 Subject: [gpfsug-discuss] Change to default for verbsRdmaMinBytes? In-Reply-To: References: <20170901165625.6e4edd4c@osc.edu> Message-ID: Hi, you still need both of them, but they are both on the list to be removed, the first is already integrated for the next major release, the 2nd we still work on. Sven On Wed, Sep 6, 2017 at 4:55 AM Kenneth Waegeman wrote: > Hi Sven, > > I see two parameters that we have set to non-default values that are not > in your list of options still to configure. > verbsRdmasPerConnection (256) and > socketMaxListenConnections (1024) > > I remember we had to set socketMaxListenConnections because our cluster > consist of +550 nodes. > > Are these settings still needed, or is this also tackled in the code? > > Thank you!! > > Cheers, > Kenneth > > > > On 02/09/17 00:42, Sven Oehme wrote: > > Hi Ed, > > yes the defaults for that have changed for customers who had not > overridden the default settings. the reason we did this was that many > systems in the field including all ESS systems that come pre-tuned where > manually changed to 8k from the 16k default due to better performance that > was confirmed in multiple customer engagements and tests with various > settings , therefore we change the default to what it should be in the > field so people are not bothered to set it anymore (simplification) or get > benefits by changing the default to provides better performance. > all this happened when we did the communication code overhaul that did > lead to significant (think factors) of improved RPC performance for RDMA > and VERBS workloads. > there is another round of significant enhancements coming soon , that will > make even more parameters either obsolete or change some of the defaults > for better out of the box performance. > i see that we should probably enhance the communication of this changes, > not that i think this will have any negative effect compared to what your > performance was with the old setting i am actually pretty confident that > you get better performance with the new code, but by setting parameters > back to default on most 'manual tuned' probably makes your system even > faster. > if you have a Scale Client on 4.2.3+ you really shouldn't have anything > set beside maxfilestocache, pagepool, workerthreads and potential prefetch > , if you are a protocol node, this and settings specific to an export > (e.g. SMB, NFS set some special settings) , pretty much everything else > these days should be set to default so the code can pick the correct > parameters., if its not and you get better performance by manual tweaking > something i like to hear about it. > on the communication side in the next release will eliminate another set > of parameters that are now 'auto set' and we plan to work on NSD next. > i presented various slides about the communication and simplicity changes > in various forums, latest public non NDA slides i presented are here --> > http://files.gpfsug.org/presentations/2017/Manchester/08_Research_Topics.pdf > > hope this helps . > > Sven > > > > On Fri, Sep 1, 2017 at 1:56 PM Edward Wahl < ewahl at osc.edu> > wrote: > >> Howdy. Just noticed this change to min RDMA packet size and I don't >> seem to >> see it in any patch notes. Maybe I just skipped the one where this >> changed? >> >> mmlsconfig verbsRdmaMinBytes >> verbsRdmaMinBytes 16384 >> >> (in case someone thinks we changed it) >> >> [root at proj-nsd01 ~]# mmlsconfig |grep verbs >> verbsRdma enable >> verbsRdma disable >> verbsRdmasPerConnection 14 >> verbsRdmasPerNode 1024 >> verbsPorts mlx5_3/1 >> verbsPorts mlx4_0 >> verbsPorts mlx5_0 >> verbsPorts mlx5_0 mlx5_1 >> verbsPorts mlx4_1/1 >> verbsPorts mlx4_1/2 >> >> >> Oddly I also see this in config, though I've seen these kinds of things >> before. >> mmdiag --config |grep verbsRdmaMinBytes >> verbsRdmaMinBytes 8192 >> >> We're on a recent efix. >> Current GPFS build: "4.2.2.3 efix21 (1028007)". >> >> -- >> >> Ed Wahl >> Ohio Supercomputer Center >> 614-292-9302 <%28614%29%20292-9302> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From heiner.billich at psi.ch Wed Sep 6 17:16:18 2017 From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI)) Date: Wed, 6 Sep 2017 16:16:18 +0000 Subject: [gpfsug-discuss] Use AFM for migration of many small files Message-ID: <7D6EFD03-5D74-4A7B-A0E8-2AD41B050E15@psi.ch> Hello Venkateswara, Edward, Thank you for the comments on how to speed up AFM prefetch with small files. We run 4.2.2-3 and the AFM mode is RO and we have just a single gateway, i.e. no parallel reads for large files. We will try to increase the value of afmNumFlushThreads. It wasn?t clear to me that these threads do read from home, too - at least for prefetch. First I will try a plain NFS mount and see how parallel reads of many small files scale the throughput. Next I will try AFM prefetch. I don?t do nice benchmarking, just watching dstat output. We prefetch 100?000 files in one bunch, so there is ample time to observe. The basic issue is that we get just about 45MB/s for sequential read of many 1000 files with 1MB per file on the home cluster. I.e. we read one file at a time before we switch to the next. This is no surprise. Each read takes about 20ms to complete, so at max we get 50 reads of 1MB per second. We?ve seen this on classical raid storage and on DSS/ESS systems. It?s likely just the physics of spinning disks and the fact that we do one read at a time and don?t allow any parallelism. We wait for one or two I/Os to single disks to complete before we continue With larger files prefetch jumps in and fires many reads in parallel ? To get 1?000MB/s I need to do 1?000 read/s and need to have ~20 reads in progress in parallel all the time ? we?ll see how close we get to 1?000MB/s with ?many small files?. Kind regards, Heiner -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch From stijn.deweirdt at ugent.be Wed Sep 6 18:13:48 2017 From: stijn.deweirdt at ugent.be (Stijn De Weirdt) Date: Wed, 6 Sep 2017 19:13:48 +0200 Subject: [gpfsug-discuss] mixed verbsRdmaSend Message-ID: hi all, what is the expected behaviour of a mixed verbsRdmaSend setup: some nodes enabled, most disabled. we have some nodes that have a very high iops workload, but most of the cluster of 500+ nodes do not have such usecase. we enabled verbsRdmaSend on the managers/quorum nodes (<10) and on the few (<10) clients with this workload, but not on the others (500+). it seems to work out fine, but is this acceptable as config? (the docs mention that enabling verbsrdamSend on a> 100 nodes might lead to errors). the nodes use ipoib as ip network, and running with verbsRdmaSend disabled on all nodes leads to unstable cluster (TX errors (<1 error in 1M packets) on some clients leading to gpfs expel nodes etc). (we still need to open a case wil mellanox to investigate further) many thanks, stijn From gcorneau at us.ibm.com Thu Sep 7 00:30:23 2017 From: gcorneau at us.ibm.com (Glen Corneau) Date: Wed, 6 Sep 2017 18:30:23 -0500 Subject: [gpfsug-discuss] Happy 20th birthday GPFS !! Message-ID: Sorry I missed the anniversary of your conception (announcement letter) back on August 26th, so I hope you'll accept my belated congratulations on this long and exciting journey! https://www-01.ibm.com/common/ssi/cgi-bin/ssialias?subtype=ca&infotype=an&appname=iSource&supplier=897&letternum=ENUS297-318 I remember your parent, PIOFS, as well! Ahh the fun times. ------------------ Glen Corneau Power Systems Washington Systems Center gcorneau at us.ibm.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 26117 bytes Desc: not available URL: From xhejtman at ics.muni.cz Thu Sep 7 16:07:20 2017 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Thu, 7 Sep 2017 17:07:20 +0200 Subject: [gpfsug-discuss] Overwritting migrated files Message-ID: <20170907150720.h3t5fowvdlibvik4@ics.muni.cz> Hello, we have files about 100GB per file. Many of these files are migrated to tapes. (GPFS+TSM, tape storage is external pool and dsmmigrate, dsmrecall are in place). These files are images from bacula backup system. When bacula wants to reuse some of images, it needs to truncate the file to 64kB and overwrite it. Is there a way not to recall whole 100GB from tapes for only to truncate the file? I tried to do partial recall: dsmrecall -D -size=65k Vol03797 after recall processing finished, I tried to truncate the file using: dd if=/dev/zero of=Vol03797 count=0 bs=64k seek=1 which caused futher recall of the whole file: $ dsmls Vol03797 IBM Spectrum Protect Command Line Space Management Client Interface Client Version 8, Release 1, Level 2.0 Client date/time: 09/07/2017 15:01:59 (c) Copyright by IBM Corporation and other(s) 1990, 2017. All Rights Reserved. ActS ResS ResB FSt FName 107380819676 10485760 31373312 m (p) Vol03797 and ResB size has been growing to 107380819676. After dd finished: dsmls Vol03797 IBM Spectrum Protect Command Line Space Management Client Interface Client Version 8, Release 1, Level 2.0 Client date/time: 09/07/2017 15:08:03 (c) Copyright by IBM Corporation and other(s) 1990, 2017. All Rights Reserved. ActS ResS ResB FSt FName 65536 65536 64 r Vol03797 Is there another way to truncate the file and drop whole migrated part? -- Luk?? Hejtm?nek From john.hearns at asml.com Thu Sep 7 16:15:00 2017 From: john.hearns at asml.com (John Hearns) Date: Thu, 7 Sep 2017 15:15:00 +0000 Subject: [gpfsug-discuss] AFM from generic NFS share - mmafmconfig Message-ID: If I have an AFM setup where the home is located on a generic NFS share, let's say server:/volume/share When I come ot set this up does it make sense to run mmafmconfig on /volume/share ? I can mount the share as a plain old NFS mount in order to run this operation, before I create the cache side fileset. Apologies if I am being dumb as an ox here, and I deserve to be slapped in the face with a wet fish https://en.wikipedia.org/wiki/The_Fish-Slapping_Dance -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From neil.wilson at metoffice.gov.uk Thu Sep 7 16:33:58 2017 From: neil.wilson at metoffice.gov.uk (Wilson, Neil) Date: Thu, 7 Sep 2017 15:33:58 +0000 Subject: [gpfsug-discuss] AFM from generic NFS share - mmafmconfig In-Reply-To: References: Message-ID: I think you need to configure a gateway node (use mmchnode to change an existing node class to gateway) Then use mmafmconfig to setup export server maps on the gateway node. e.g. mmafmconfig -add "mapping1" -export-map "nfsServerIP"/"GatewayNode" (double quotes not required) mafmconfig show all Map name: mapping1 Export server map: IP/GatewayNode From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of John Hearns Sent: 07 September 2017 16:15 To: gpfsug main discussion list Subject: [gpfsug-discuss] AFM from generic NFS share - mmafmconfig If I have an AFM setup where the home is located on a generic NFS share, let's say server:/volume/share When I come ot set this up does it make sense to run mmafmconfig on /volume/share ? I can mount the share as a plain old NFS mount in order to run this operation, before I create the cache side fileset. Apologies if I am being dumb as an ox here, and I deserve to be slapped in the face with a wet fish https://en.wikipedia.org/wiki/The_Fish-Slapping_Dance -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.hearns at asml.com Thu Sep 7 16:52:19 2017 From: john.hearns at asml.com (John Hearns) Date: Thu, 7 Sep 2017 15:52:19 +0000 Subject: [gpfsug-discuss] Deletion of a pcache snapshot? Message-ID: Firmly lining myself up for a smack round the chops with a wet haddock... I try to delete an AFM cache fileset which I create da few days ago (it has an NFS home) Mmdelfileset responds that : Fileset obfuscated has 1 fileset snapshot(s). When I try to delete the snapshot: Snapshot obfuscated.afm.1234 is an internal pcache recovery snapshot and cannot be deleted by user. I find this reference, which is about as useful as a wet haddock: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm The advice of the gallery is sought, please. -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From janusz.malka at desy.de Thu Sep 7 20:23:36 2017 From: janusz.malka at desy.de (Malka, Janusz) Date: Thu, 7 Sep 2017 21:23:36 +0200 (CEST) Subject: [gpfsug-discuss] Deletion of a pcache snapshot? In-Reply-To: References: Message-ID: <950421843.15715748.1504812216286.JavaMail.zimbra@desy.de> I had similar issue, I had to recover connection to home From: "John Hearns" To: "gpfsug main discussion list" Sent: Thursday, 7 September, 2017 17:52:19 Subject: [gpfsug-discuss] Deletion of a pcache snapshot? Firmly lining myself up for a smack round the chops with a wet haddock? I try to delete an AFM cache fileset which I create da few days ago (it has an NFS home) Mmdelfileset responds that : Fileset obfuscated has 1 fileset snapshot(s). When I try to delete the snapshot: Snapshot obfuscated.afm.1234 is an internal pcache recovery snapshot and cannot be deleted by user. I find this reference, which is about as useful as a wet haddock: [ https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm | https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm ] The advice of the gallery is sought, please. -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From christof.schmitt at us.ibm.com Thu Sep 7 22:16:34 2017 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Thu, 7 Sep 2017 21:16:34 +0000 Subject: [gpfsug-discuss] SMB2 leases - oplocks - growing files In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Fri Sep 8 03:11:48 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Thu, 7 Sep 2017 22:11:48 -0400 Subject: [gpfsug-discuss] mmfsd write behavior Message-ID: <0f61621f-84d9-e249-0dd7-c1a4d50fea86@nasa.gov> Hi Everyone, This is something that's come up in the past and has recently resurfaced with a project I've been working on, and that is-- it seems to me as though mmfsd never attempts to flush the cache of the block devices its writing to (looking at blktrace output seems to confirm this). Is this actually the case? I've looked at the gpl headers for linux and I don't see any sign of blkdev_fsync, blkdev_issue_flush, WRITE_FLUSH, or REQ_FLUSH. I'm sure there's other ways to trigger this behavior that GPFS may very well be using that I've missed. That's why I'm asking :) I figure with FPO being pushed as an HDFS replacement using commodity drives this feature has *got* to be in the code somewhere. -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From oehmes at gmail.com Fri Sep 8 03:55:14 2017 From: oehmes at gmail.com (Sven Oehme) Date: Fri, 08 Sep 2017 02:55:14 +0000 Subject: [gpfsug-discuss] mmfsd write behavior In-Reply-To: <0f61621f-84d9-e249-0dd7-c1a4d50fea86@nasa.gov> References: <0f61621f-84d9-e249-0dd7-c1a4d50fea86@nasa.gov> Message-ID: I am not sure what exactly you are looking for but all blockdevices are opened with O_DIRECT , we never cache anything on this layer . On Thu, Sep 7, 2017, 7:11 PM Aaron Knister wrote: > Hi Everyone, > > This is something that's come up in the past and has recently resurfaced > with a project I've been working on, and that is-- it seems to me as > though mmfsd never attempts to flush the cache of the block devices its > writing to (looking at blktrace output seems to confirm this). Is this > actually the case? I've looked at the gpl headers for linux and I don't > see any sign of blkdev_fsync, blkdev_issue_flush, WRITE_FLUSH, or > REQ_FLUSH. I'm sure there's other ways to trigger this behavior that > GPFS may very well be using that I've missed. That's why I'm asking :) > > I figure with FPO being pushed as an HDFS replacement using commodity > drives this feature has *got* to be in the code somewhere. > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Fri Sep 8 04:05:42 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Thu, 7 Sep 2017 23:05:42 -0400 Subject: [gpfsug-discuss] mmfsd write behavior In-Reply-To: References: <0f61621f-84d9-e249-0dd7-c1a4d50fea86@nasa.gov> Message-ID: Thanks Sven. I didn't think GPFS itself was caching anything on that layer, but it's my understanding that O_DIRECT isn't sufficient to force I/O to be flushed (e.g. the device itself might have a volatile caching layer). Take someone using ZFS zvol's as NSDs. I can write() all day log to that zvol (even with O_DIRECT) but there is absolutely no guarantee those writes have been committed to stable storage and aren't just sitting in RAM until an fsync() occurs (or some other bio function that causes a flush). I also don't believe writing to a SATA drive with O_DIRECT will force cache flushes of the drive's writeback cache.. although I just tested that one and it seems to actually trigger a scsi cache sync. Interesting. -Aaron On 9/7/17 10:55 PM, Sven Oehme wrote: > I am not sure what exactly you are looking for but all blockdevices are > opened with O_DIRECT , we never cache anything on this layer . > > > On Thu, Sep 7, 2017, 7:11 PM Aaron Knister > wrote: > > Hi Everyone, > > This is something that's come up in the past and has recently resurfaced > with a project I've been working on, and that is-- it seems to me as > though mmfsd never attempts to flush the cache of the block devices its > writing to (looking at blktrace output seems to confirm this). Is this > actually the case? I've looked at the gpl headers for linux and I don't > see any sign of blkdev_fsync, blkdev_issue_flush, WRITE_FLUSH, or > REQ_FLUSH. I'm sure there's other ways to trigger this behavior that > GPFS may very well be using that I've missed. That's why I'm asking :) > > I figure with FPO being pushed as an HDFS replacement using commodity > drives this feature has *got* to be in the code somewhere. > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From aaron.s.knister at nasa.gov Fri Sep 8 04:26:02 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Thu, 7 Sep 2017 23:26:02 -0400 Subject: [gpfsug-discuss] Happy 20th birthday GPFS !! In-Reply-To: References: Message-ID: <4a9feeb2-bb9d-8c9a-e506-926d8537cada@nasa.gov> Sounds like celebratory cake is in order for the users group in a few weeks ;) On 9/6/17 7:30 PM, Glen Corneau wrote: > Sorry I missed the anniversary of your conception ?(announcement letter) > back on August 26th, so I hope you'll accept my belated congratulations > on this long and exciting journey! > > https://www-01.ibm.com/common/ssi/cgi-bin/ssialias?subtype=ca&infotype=an&appname=iSource&supplier=897&letternum=ENUS297-318 > > I remember your parent, PIOFS, as well! ?Ahh the fun times. > ------------------ > Glen Corneau > Power Systems > Washington Systems Center > gcorneau at us.ibm.com > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From vpuvvada at in.ibm.com Fri Sep 8 06:00:46 2017 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Fri, 8 Sep 2017 10:30:46 +0530 Subject: [gpfsug-discuss] Deletion of a pcache snapshot? In-Reply-To: <950421843.15715748.1504812216286.JavaMail.zimbra@desy.de> References: <950421843.15715748.1504812216286.JavaMail.zimbra@desy.de> Message-ID: Snapshots created by AFM (recovery, resync or peer snapshots) cannot be deleted by user using mmdelsnapshot command directly. After recovery or resync completion they get deleted automatically. For peer snapshots deletion mmpsnasp command is used. Which version of GPFS? Try with -p (undocumented) option. mmdelsnapshot device snapname -j fset -p ~Venkat (vpuvvada at in.ibm.com) From: "Malka, Janusz" To: gpfsug main discussion list Date: 09/08/2017 12:53 AM Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? Sent by: gpfsug-discuss-bounces at spectrumscale.org I had similar issue, I had to recover connection to home From: "John Hearns" To: "gpfsug main discussion list" Sent: Thursday, 7 September, 2017 17:52:19 Subject: [gpfsug-discuss] Deletion of a pcache snapshot? Firmly lining myself up for a smack round the chops with a wet haddock? I try to delete an AFM cache fileset which I create da few days ago (it has an NFS home) Mmdelfileset responds that : Fileset obfuscated has 1 fileset snapshot(s). When I try to delete the snapshot: Snapshot obfuscated.afm.1234 is an internal pcache recovery snapshot and cannot be deleted by user. I find this reference, which is about as useful as a wet haddock: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm The advice of the gallery is sought, please. -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=VX-Y-5GtodMzrpt1D71-1OOY3hu2UTJBg45sqxTHC8I&s=AmQf6iZeaanuNkB3Ys2lR8Ajk-n2TUJ6Wbt2z2pnbKI&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From vpuvvada at in.ibm.com Fri Sep 8 06:21:47 2017 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Fri, 8 Sep 2017 10:51:47 +0530 Subject: [gpfsug-discuss] AFM from generic NFS share - mmafmconfig In-Reply-To: References: Message-ID: mmafmconfig command should be run on the target path (path specified in the afmTarget option when fileset is created). If many filesets are sharing the same target (ex independent writer mode) , enable AFM once on target path. Run the command at home cluster. mmafmconifg enable afmTarget ~Venkat (vpuvvada at in.ibm.com) From: "Wilson, Neil" To: gpfsug main discussion list Date: 09/07/2017 09:04 PM Subject: Re: [gpfsug-discuss] AFM from generic NFS share - mmafmconfig Sent by: gpfsug-discuss-bounces at spectrumscale.org I think you need to configure a gateway node (use mmchnode to change an existing node class to gateway) Then use mmafmconfig to setup export server maps on the gateway node. e.g. mmafmconfig ?add ?mapping1? ?export-map ?nfsServerIP?/?GatewayNode? (double quotes not required) mafmconfig show all Map name: mapping1 Export server map: IP/GatewayNode From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of John Hearns Sent: 07 September 2017 16:15 To: gpfsug main discussion list Subject: [gpfsug-discuss] AFM from generic NFS share - mmafmconfig If I have an AFM setup where the home is located on a generic NFS share, let?s say server:/volume/share When I come ot set this up does it make sense to run mmafmconfig on /volume/share ? I can mount the share as a plain old NFS mount in order to run this operation, before I create the cache side fileset. Apologies if I am being dumb as an ox here, and I deserve to be slapped in the face with a wet fish https://en.wikipedia.org/wiki/The_Fish-Slapping_Dance -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=kKlSEJqmVE6q8Qt02JNaDLsewp13C0yRAmlfc_djRkk&s=JIbuXlCiReZx3ws5__6juuGC-sAqM74296BuyzgyNYg&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From gellis at ocf.co.uk Fri Sep 8 08:04:51 2017 From: gellis at ocf.co.uk (Georgina Ellis) Date: Fri, 8 Sep 2017 07:04:51 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 68, Issue 13 In-Reply-To: References: Message-ID: <0CBB283A-A0A9-4FC9-A1CD-9E019D74CDB9@ocf.co.uk> I am still populating your lot 2 response - it is split across 3 word docs and a whole heap of emails so easier for me to keep going - I dropped u off a lot of emails to save filling your inbox :-) Could you poke around other tenders for the portal question please? Sent from my iPhone > On 7 Sep 2017, at 22:16, "gpfsug-discuss-request at spectrumscale.org" wrote: > > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at spectrumscale.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at spectrumscale.org > > You can reach the person managing the list at > gpfsug-discuss-owner at spectrumscale.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Re: Deletion of a pcache snapshot? (Malka, Janusz) > 2. Re: SMB2 leases - oplocks - growing files (Christof Schmitt) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 7 Sep 2017 21:23:36 +0200 (CEST) > From: "Malka, Janusz" > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? > Message-ID: <950421843.15715748.1504812216286.JavaMail.zimbra at desy.de> > Content-Type: text/plain; charset="utf-8" > > I had similar issue, I had to recover connection to home > > > From: "John Hearns" > To: "gpfsug main discussion list" > Sent: Thursday, 7 September, 2017 17:52:19 > Subject: [gpfsug-discuss] Deletion of a pcache snapshot? > > > > Firmly lining myself up for a smack round the chops with a wet haddock? > > I try to delete an AFM cache fileset which I create da few days ago (it has an NFS home) > > Mmdelfileset responds that : > > Fileset obfuscated has 1 fileset snapshot(s). > > > > When I try to delete the snapshot: > > Snapshot obfuscated.afm.1234 is an internal pcache recovery snapshot and cannot be deleted by user. > > > > I find this reference, which is about as useful as a wet haddock: > > [ https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm | https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm ] > > > > The advice of the gallery is sought, please. > > > > > > > -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > Message: 2 > Date: Thu, 7 Sep 2017 21:16:34 +0000 > From: "Christof Schmitt" > To: gpfsug-discuss at spectrumscale.org > Cc: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] SMB2 leases - oplocks - growing files > Message-ID: > > > Content-Type: text/plain; charset="us-ascii" > > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 68, Issue 13 > ********************************************** From john.hearns at asml.com Fri Sep 8 08:26:01 2017 From: john.hearns at asml.com (John Hearns) Date: Fri, 8 Sep 2017 07:26:01 +0000 Subject: [gpfsug-discuss] Deletion of a pcache snapshot? In-Reply-To: References: <950421843.15715748.1504812216286.JavaMail.zimbra@desy.de> Message-ID: Venkat, thankyou. I have a support ticket open on this issue, and will keep this option handy! From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Venkateswara R Puvvada Sent: Friday, September 08, 2017 7:01 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? Snapshots created by AFM (recovery, resync or peer snapshots) cannot be deleted by user using mmdelsnapshot command directly. After recovery or resync completion they get deleted automatically. For peer snapshots deletion mmpsnasp command is used. Which version of GPFS? Try with -p (undocumented) option. mmdelsnapshot device snapname -j fset -p ~Venkat (vpuvvada at in.ibm.com) From: "Malka, Janusz" > To: gpfsug main discussion list > Date: 09/08/2017 12:53 AM Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I had similar issue, I had to recover connection to home ________________________________ From: "John Hearns" > To: "gpfsug main discussion list" > Sent: Thursday, 7 September, 2017 17:52:19 Subject: [gpfsug-discuss] Deletion of a pcache snapshot? Firmly lining myself up for a smack round the chops with a wet haddock? I try to delete an AFM cache fileset which I create da few days ago (it has an NFS home) Mmdelfileset responds that : Fileset obfuscated has 1 fileset snapshot(s). When I try to delete the snapshot: Snapshot obfuscated.afm.1234 is an internal pcache recovery snapshot and cannot be deleted by user. I find this reference, which is about as useful as a wet haddock: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm The advice of the gallery is sought, please. -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=VX-Y-5GtodMzrpt1D71-1OOY3hu2UTJBg45sqxTHC8I&s=AmQf6iZeaanuNkB3Ys2lR8Ajk-n2TUJ6Wbt2z2pnbKI&e= -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gellis at ocf.co.uk Fri Sep 8 08:33:51 2017 From: gellis at ocf.co.uk (Georgina Ellis) Date: Fri, 8 Sep 2017 07:33:51 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 68, Issue 13 In-Reply-To: References: Message-ID: <93DCF805-F703-4ED5-A079-A44992A9268C@ocf.co.uk> Apologies All, slip of the keyboard and not a comment on GPFS! Sent from my iPhone > On 7 Sep 2017, at 22:16, "gpfsug-discuss-request at spectrumscale.org" wrote: > > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at spectrumscale.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at spectrumscale.org > > You can reach the person managing the list at > gpfsug-discuss-owner at spectrumscale.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Re: Deletion of a pcache snapshot? (Malka, Janusz) > 2. Re: SMB2 leases - oplocks - growing files (Christof Schmitt) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 7 Sep 2017 21:23:36 +0200 (CEST) > From: "Malka, Janusz" > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? > Message-ID: <950421843.15715748.1504812216286.JavaMail.zimbra at desy.de> > Content-Type: text/plain; charset="utf-8" > > I had similar issue, I had to recover connection to home > > > From: "John Hearns" > To: "gpfsug main discussion list" > Sent: Thursday, 7 September, 2017 17:52:19 > Subject: [gpfsug-discuss] Deletion of a pcache snapshot? > > > > Firmly lining myself up for a smack round the chops with a wet haddock? > > I try to delete an AFM cache fileset which I create da few days ago (it has an NFS home) > > Mmdelfileset responds that : > > Fileset obfuscated has 1 fileset snapshot(s). > > > > When I try to delete the snapshot: > > Snapshot obfuscated.afm.1234 is an internal pcache recovery snapshot and cannot be deleted by user. > > > > I find this reference, which is about as useful as a wet haddock: > > [ https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm | https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm ] > > > > The advice of the gallery is sought, please. > > > > > > > -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > Message: 2 > Date: Thu, 7 Sep 2017 21:16:34 +0000 > From: "Christof Schmitt" > To: gpfsug-discuss at spectrumscale.org > Cc: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] SMB2 leases - oplocks - growing files > Message-ID: > > > Content-Type: text/plain; charset="us-ascii" > > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 68, Issue 13 > ********************************************** From Sandra.McLaughlin at astrazeneca.com Fri Sep 8 10:12:02 2017 From: Sandra.McLaughlin at astrazeneca.com (McLaughlin, Sandra M) Date: Fri, 8 Sep 2017 09:12:02 +0000 Subject: [gpfsug-discuss] Deletion of a pcache snapshot? In-Reply-To: References: <950421843.15715748.1504812216286.JavaMail.zimbra@desy.de> Message-ID: John, I had a period when I had to delete and remake AFM filesets rather frequently ? this always worked for me: mmunlinkfileset device fset -f mmdelsnapshot device snapname -j fset mmdelfileset device fset -f Sandra From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of John Hearns Sent: 08 September 2017 08:26 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? Venkat, thankyou. I have a support ticket open on this issue, and will keep this option handy! From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Venkateswara R Puvvada Sent: Friday, September 08, 2017 7:01 AM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? Snapshots created by AFM (recovery, resync or peer snapshots) cannot be deleted by user using mmdelsnapshot command directly. After recovery or resync completion they get deleted automatically. For peer snapshots deletion mmpsnasp command is used. Which version of GPFS? Try with -p (undocumented) option. mmdelsnapshot device snapname -j fset -p ~Venkat (vpuvvada at in.ibm.com) From: "Malka, Janusz" > To: gpfsug main discussion list > Date: 09/08/2017 12:53 AM Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I had similar issue, I had to recover connection to home ________________________________ From: "John Hearns" > To: "gpfsug main discussion list" > Sent: Thursday, 7 September, 2017 17:52:19 Subject: [gpfsug-discuss] Deletion of a pcache snapshot? Firmly lining myself up for a smack round the chops with a wet haddock? I try to delete an AFM cache fileset which I create da few days ago (it has an NFS home) Mmdelfileset responds that : Fileset obfuscated has 1 fileset snapshot(s). When I try to delete the snapshot: Snapshot obfuscated.afm.1234 is an internal pcache recovery snapshot and cannot be deleted by user. I find this reference, which is about as useful as a wet haddock: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm The advice of the gallery is sought, please. -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=VX-Y-5GtodMzrpt1D71-1OOY3hu2UTJBg45sqxTHC8I&s=AmQf6iZeaanuNkB3Ys2lR8Ajk-n2TUJ6Wbt2z2pnbKI&e= -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. ________________________________ AstraZeneca UK Limited is a company incorporated in England and Wales with registered number:03674842 and its registered office at 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA. This e-mail and its attachments are intended for the above named recipient only and may contain confidential and privileged information. If they have come to you in error, you must not copy or show them to anyone; instead, please reply to this e-mail, highlighting the error to the sender and then immediately delete the message. For information about how AstraZeneca UK Limited and its affiliates may process information, personal data and monitor communications, please see our privacy notice at www.astrazeneca.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.hearns at asml.com Fri Sep 8 11:57:14 2017 From: john.hearns at asml.com (John Hearns) Date: Fri, 8 Sep 2017 10:57:14 +0000 Subject: [gpfsug-discuss] Deletion of a pcache snapshot? In-Reply-To: References: <950421843.15715748.1504812216286.JavaMail.zimbra@desy.de> Message-ID: Sandra, Thankyou for the help. I have a support ticket outstanding, and will see what is suggested. I am sure this is a simple matter of deleting the fileset as you say! From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of McLaughlin, Sandra M Sent: Friday, September 08, 2017 11:12 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? John, I had a period when I had to delete and remake AFM filesets rather frequently ? this always worked for me: mmunlinkfileset device fset -f mmdelsnapshot device snapname -j fset mmdelfileset device fset -f Sandra From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of John Hearns Sent: 08 September 2017 08:26 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? Venkat, thankyou. I have a support ticket open on this issue, and will keep this option handy! From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Venkateswara R Puvvada Sent: Friday, September 08, 2017 7:01 AM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? Snapshots created by AFM (recovery, resync or peer snapshots) cannot be deleted by user using mmdelsnapshot command directly. After recovery or resync completion they get deleted automatically. For peer snapshots deletion mmpsnasp command is used. Which version of GPFS? Try with -p (undocumented) option. mmdelsnapshot device snapname -j fset -p ~Venkat (vpuvvada at in.ibm.com) From: "Malka, Janusz" > To: gpfsug main discussion list > Date: 09/08/2017 12:53 AM Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I had similar issue, I had to recover connection to home ________________________________ From: "John Hearns" > To: "gpfsug main discussion list" > Sent: Thursday, 7 September, 2017 17:52:19 Subject: [gpfsug-discuss] Deletion of a pcache snapshot? Firmly lining myself up for a smack round the chops with a wet haddock? I try to delete an AFM cache fileset which I create da few days ago (it has an NFS home) Mmdelfileset responds that : Fileset obfuscated has 1 fileset snapshot(s). When I try to delete the snapshot: Snapshot obfuscated.afm.1234 is an internal pcache recovery snapshot and cannot be deleted by user. I find this reference, which is about as useful as a wet haddock: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm The advice of the gallery is sought, please. -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=VX-Y-5GtodMzrpt1D71-1OOY3hu2UTJBg45sqxTHC8I&s=AmQf6iZeaanuNkB3Ys2lR8Ajk-n2TUJ6Wbt2z2pnbKI&e= -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. ________________________________ AstraZeneca UK Limited is a company incorporated in England and Wales with registered number:03674842 and its registered office at 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA. This e-mail and its attachments are intended for the above named recipient only and may contain confidential and privileged information. If they have come to you in error, you must not copy or show them to anyone; instead, please reply to this e-mail, highlighting the error to the sender and then immediately delete the message. For information about how AstraZeneca UK Limited and its affiliates may process information, personal data and monitor communications, please see our privacy notice at www.astrazeneca.com -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Fri Sep 8 11:58:05 2017 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Fri, 8 Sep 2017 03:58:05 -0700 Subject: [gpfsug-discuss] Hold the Date - Spectrum Scale Day @ HPCXXL (Sept 2017, NYC) In-Reply-To: <6EF4187F-D8A1-4927-9E4F-4DF703DA04F5@lbl.gov> References: <6EF4187F-D8A1-4927-9E4F-4DF703DA04F5@lbl.gov> Message-ID: Hello, The agenda for the GPFS Day during HPCXXL is fairly fleshed out here: http://hpcxxl.org/summer-2017-meeting-september-24-29-new-york-city/ See notes on registration below, which is free but required. Use the HPCXXL registration form, which has a $0 GPFS Day registration option. Hope to see some of you there. Best, Kristy > On Aug 21, 2017, at 3:33 PM, Kristy Kallback-Rose wrote: > > If you plan on attending the GPFS Day, please use the HPCXXL registration form (link to Eventbrite registration at the link below). The GPFS day is a free event, but you *must* register so we can make sure there are enough seats and food available. > > If you would like to speak or suggest a topic, please let me know. > > http://hpcxxl.org/summer-2017-meeting-september-24-29-new-york-city/ > > The agenda is still being worked on, here are some likely topics: > > --RoadMap/Updates > --"New features - New Bugs? (Julich) > --GPFS + Openstack (CSCS) > --ORNL Update on Spider3-related GPFS work > --ANL Site Update > --File Corruption Session > > Best, > Kristy > >> On Aug 8, 2017, at 11:33 AM, Kristy Kallback-Rose > wrote: >> >> Hello, >> >> The GPFS Day of the HPCXXL conference is confirmed for Thursday, September 28th. Here is an updated URL, the agenda and registration are still being put together http://hpcxxl.org/summer-2017-meeting-september-24-29-new-york-city/ . The GPFS Day will require registration, so we can make sure there is enough room (and coffee/food) for all attendees ?however, there will be no registration fee if you attend the GPFS Day only. >> >> I?ll send another update when the agenda is closer to settled. >> >> Cheers, >> Kristy >> >>> On Jul 7, 2017, at 3:32 PM, Kristy Kallback-Rose > wrote: >>> >>> Hello, >>> >>> More details will be provided as they become available, but just so you can make a placeholder on your calendar, there will be a Spectrum Scale Day the week of September 25th - 29th, likely Thursday, September 28th. >>> >>> This will be a part of the larger HPCXXL meeting (https://www.spxxl.org/?q=New-York-City-2017 ). You may recall this group was formerly called SPXXL and the website is in the process of transitioning to the new name (and getting a new certificate). You will be able to attend *just* the Spectrum Scale day if that is the only portion of the event you would like to attend. >>> >>> The NYC location is great for Spectrum Scale events because many IBMers, including developers, can come in from Poughkeepsie. >>> >>> More as we get closer to the date and details are settled. >>> >>> Cheers, >>> Kristy >> > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From hpc.ken.tw25qn at gmail.com Fri Sep 8 19:30:32 2017 From: hpc.ken.tw25qn at gmail.com (Ken Atkinson) Date: Fri, 8 Sep 2017 19:30:32 +0100 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 68, Issue 13 In-Reply-To: References: <93DCF805-F703-4ED5-A079-A44992A9268C@ocf.co.uk> Message-ID: Not on too many G&Ts Georgina? How are things. Ken Atkinson On 8 Sep 2017 08:33, "Georgina Ellis" wrote: Apologies All, slip of the keyboard and not a comment on GPFS! Sent from my iPhone > On 7 Sep 2017, at 22:16, "gpfsug-discuss-request at spectrumscale.org" wrote: > > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at spectrumscale.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at spectrumscale.org > > You can reach the person managing the list at > gpfsug-discuss-owner at spectrumscale.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Re: Deletion of a pcache snapshot? (Malka, Janusz) > 2. Re: SMB2 leases - oplocks - growing files (Christof Schmitt) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 7 Sep 2017 21:23:36 +0200 (CEST) > From: "Malka, Janusz" > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? > Message-ID: <950421843.15715748.1504812216286.JavaMail.zimbra at desy.de> > Content-Type: text/plain; charset="utf-8" > > I had similar issue, I had to recover connection to home > > > From: "John Hearns" > To: "gpfsug main discussion list" > Sent: Thursday, 7 September, 2017 17:52:19 > Subject: [gpfsug-discuss] Deletion of a pcache snapshot? > > > > Firmly lining myself up for a smack round the chops with a wet haddock? > > I try to delete an AFM cache fileset which I create da few days ago (it has an NFS home) > > Mmdelfileset responds that : > > Fileset obfuscated has 1 fileset snapshot(s). > > > > When I try to delete the snapshot: > > Snapshot obfuscated.afm.1234 is an internal pcache recovery snapshot and cannot be deleted by user. > > > > I find this reference, which is about as useful as a wet haddock: > > [ https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2. 3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm | https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2. 3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm ] > > > > The advice of the gallery is sought, please. > > > > > > > -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > Message: 2 > Date: Thu, 7 Sep 2017 21:16:34 +0000 > From: "Christof Schmitt" > To: gpfsug-discuss at spectrumscale.org > Cc: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] SMB2 leases - oplocks - growing files > Message-ID: > > > Content-Type: text/plain; charset="us-ascii" > > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 68, Issue 13 > ********************************************** _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Fri Sep 8 22:14:04 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Fri, 8 Sep 2017 17:14:04 -0400 Subject: [gpfsug-discuss] multicluster security In-Reply-To: References: <83936033-ce82-0a9b-3714-1dbea4c317db@nasa.gov> Message-ID: <529f575b-eb11-a81e-2905-ab3237d41678@nasa.gov> Interesting! Thank you for the explanation. This makes me wish GPFS had a client access model that more closely mimicked parallel NAS, specifically for this reason. That then got me wondering about pNFS support. I've not been able to find much about that but in theory Ganesha supports pNFS. Does anyone know of successful pNFS testing with GPFS and if so how one would set up such a thing? -Aaron On 08/25/2017 06:41 PM, IBM Spectrum Scale wrote: > > Hi Aaron, > > If cluster A uses the mmauth command to grant a file system read-only > access to a remote cluster B, nodes on cluster B can only mount that > file system with read-only access. But the only checking being done at > the RPC level is the TLS authentication. This should prevent non-root > users from initiating RPCs, since TLS authentication requires access > to the local cluster's private key. However, a root user on cluster B, > having access to cluster B's private key, might be able to craft RPCs > that may allow one to work around the checks which are implemented at > the file system level. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum > Scale (GPFS), then please post it to the public IBM developerWroks > Forum at > https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > > If your query concerns a potential software error in Spectrum Scale > (GPFS) and you have an IBM software maintenance contract please > contact 1-800-237-5511 in the United States or your local IBM Service > Center in other countries. > > The forum is informally monitored as time permits and should not be > used for priority messages to the Spectrum Scale (GPFS) team. > > Inactive hide details for Aaron Knister ---08/21/2017 11:04:06 PM---Hi > Everyone, I have a theoretical question about GPFS multiAaron Knister > ---08/21/2017 11:04:06 PM---Hi Everyone, I have a theoretical question > about GPFS multiclusters and security. > > From: Aaron Knister > To: gpfsug main discussion list > Date: 08/21/2017 11:04 PM > Subject: [gpfsug-discuss] multicluster security > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > ------------------------------------------------------------------------ > > > > Hi Everyone, > > I have a theoretical question about GPFS multiclusters and security. > Let's say I have clusters A and B. Cluster A is exporting a filesystem > as read-only to cluster B. > > Where does the authorization burden lay? Meaning, does the security rely > on mmfsd in cluster B to behave itself and enforce the conditions of the > multi-cluster export? Could someone using the credentials on a > compromised node in cluster B just start sending arbitrary nsd > read/write commands to the nsds from cluster A (or something along those > lines)? Do the NSD servers in cluster A do any sort of sanity or > security checking on the I/O requests coming from cluster B to the NSDs > they're serving to exported filesystems? > > I imagine any enforcement would go out the window with shared disks in a > multi-cluster environment since a compromised node could just "dd" over > the LUNs. > > Thanks! > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=oK_bEPbjuD7j6qLTHbe7HM4ujUlpcNYtX3tMW2QC7_w&s=BliMQ0pToLIIiO1jfyUp2Q3icewcONrcmHpsIj_hMtY&e= > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From oehmes at gmail.com Fri Sep 8 22:21:00 2017 From: oehmes at gmail.com (Sven Oehme) Date: Fri, 08 Sep 2017 21:21:00 +0000 Subject: [gpfsug-discuss] mmfsd write behavior In-Reply-To: References: <0f61621f-84d9-e249-0dd7-c1a4d50fea86@nasa.gov> Message-ID: Hi, the code assumption is that the underlying device has no volatile write cache, i was absolute sure we have that somewhere in the FAQ, but i couldn't find it, so i will talk to somebody to correct this. if i understand https://www.kernel.org/doc/Documentation/block/writeback_cache_control.txt correct one could enforce this by setting REQ_FUA, but thats not explicitly set today, at least i can't see it. i will discuss this with one of our devs who owns this code and come back. sven On Thu, Sep 7, 2017 at 8:05 PM Aaron Knister wrote: > Thanks Sven. I didn't think GPFS itself was caching anything on that > layer, but it's my understanding that O_DIRECT isn't sufficient to force > I/O to be flushed (e.g. the device itself might have a volatile caching > layer). Take someone using ZFS zvol's as NSDs. I can write() all day log > to that zvol (even with O_DIRECT) but there is absolutely no guarantee > those writes have been committed to stable storage and aren't just > sitting in RAM until an fsync() occurs (or some other bio function that > causes a flush). I also don't believe writing to a SATA drive with > O_DIRECT will force cache flushes of the drive's writeback cache.. > although I just tested that one and it seems to actually trigger a scsi > cache sync. Interesting. > > -Aaron > > On 9/7/17 10:55 PM, Sven Oehme wrote: > > I am not sure what exactly you are looking for but all blockdevices are > > opened with O_DIRECT , we never cache anything on this layer . > > > > > > On Thu, Sep 7, 2017, 7:11 PM Aaron Knister > > wrote: > > > > Hi Everyone, > > > > This is something that's come up in the past and has recently > resurfaced > > with a project I've been working on, and that is-- it seems to me as > > though mmfsd never attempts to flush the cache of the block devices > its > > writing to (looking at blktrace output seems to confirm this). Is > this > > actually the case? I've looked at the gpl headers for linux and I > don't > > see any sign of blkdev_fsync, blkdev_issue_flush, WRITE_FLUSH, or > > REQ_FLUSH. I'm sure there's other ways to trigger this behavior that > > GPFS may very well be using that I've missed. That's why I'm asking > :) > > > > I figure with FPO being pushed as an HDFS replacement using commodity > > drives this feature has *got* to be in the code somewhere. > > > > -Aaron > > > > -- > > Aaron Knister > > NASA Center for Climate Simulation (Code 606.2) > > Goddard Space Flight Center > > (301) 286-2776 > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Sat Sep 9 09:05:31 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Sat, 9 Sep 2017 10:05:31 +0200 Subject: [gpfsug-discuss] multicluster security In-Reply-To: <529f575b-eb11-a81e-2905-ab3237d41678@nasa.gov> References: <83936033-ce82-0a9b-3714-1dbea4c317db@nasa.gov> <529f575b-eb11-a81e-2905-ab3237d41678@nasa.gov> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 105 bytes Desc: not available URL: From aaron.s.knister at nasa.gov Mon Sep 11 01:43:56 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Sun, 10 Sep 2017 20:43:56 -0400 Subject: [gpfsug-discuss] tuning parameters question Message-ID: <21830c8a-4a00-c75a-76aa-772c261c00eb@nasa.gov> Hi All (but mostly Sven), I stumbled across this great gem: files.gpfsug.org/presentations/2014/UG10_GPFS_Performance_Session_v10.pdf and I'm wondering which, if any, of those tuning parameters are still relevant with the 4.2.3 code. Specifically for a CNFS cluster. I'm exporting a gpfs fs as an NFS root to 1k nodes. The boot storm is particularly ugly and the storage doesn't appear to be bottlenecked. I see a lot of waiters like these: Waiting 0.0009 sec since 20:41:31, monitored, thread 2881 InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), reason 'waiting for LX lock' Waiting 0.0009 sec since 20:41:31, monitored, thread 26231 InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), reason 'waiting for LX lock' Waiting 0.0009 sec since 20:41:31, monitored, thread 26146 InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), reason 'waiting for LX lock' Waiting 0.0009 sec since 20:41:31, monitored, thread 18637 InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), reason 'waiting for LX lock' Waiting 0.0009 sec since 20:41:31, monitored, thread 25013 InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), reason 'waiting for LX lock' Waiting 0.0009 sec since 20:41:31, monitored, thread 27879 InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), reason 'waiting for LX lock' Waiting 0.0009 sec since 20:41:31, monitored, thread 26553 InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), reason 'waiting for LX lock' Waiting 0.0009 sec since 20:41:31, monitored, thread 25334 InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), reason 'waiting for LX lock' Waiting 0.0009 sec since 20:41:31, monitored, thread 25337 InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), reason 'waiting for LX lock' and I'm wondering if there's anything immediate one would suggest to help with that. -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From aaron.s.knister at nasa.gov Mon Sep 11 01:50:39 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Sun, 10 Sep 2017 20:50:39 -0400 Subject: [gpfsug-discuss] tuning parameters question In-Reply-To: <21830c8a-4a00-c75a-76aa-772c261c00eb@nasa.gov> References: <21830c8a-4a00-c75a-76aa-772c261c00eb@nasa.gov> Message-ID: <25e8995a-7b5b-05d3-016a-4941fb75dcd6@nasa.gov> As an aside, my initial attempt was to use Ganesha via CES but the performance was significantly worse than CNFS for this workload. The docs seem to suggest that CNFS performs better for metadata intensive workloads which certainly seems to fit the bill here. -Aaron On 9/10/17 8:43 PM, Aaron Knister wrote: > Hi All (but mostly Sven), > > I stumbled across this great gem: > > files.gpfsug.org/presentations/2014/UG10_GPFS_Performance_Session_v10.pdf > > and I'm wondering which, if any, of those tuning parameters are still > relevant with the 4.2.3 code. Specifically for a CNFS cluster. I'm > exporting a gpfs fs as an NFS root to 1k nodes. The boot storm is > particularly ugly and the storage doesn't appear to be bottlenecked. > > I see a lot of waiters like these: > > Waiting 0.0009 sec since 20:41:31, monitored, thread 2881 > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > reason 'waiting for LX lock' > Waiting 0.0009 sec since 20:41:31, monitored, thread 26231 > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > reason 'waiting for LX lock' > Waiting 0.0009 sec since 20:41:31, monitored, thread 26146 > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > reason 'waiting for LX lock' > Waiting 0.0009 sec since 20:41:31, monitored, thread 18637 > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > reason 'waiting for LX lock' > Waiting 0.0009 sec since 20:41:31, monitored, thread 25013 > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > reason 'waiting for LX lock' > Waiting 0.0009 sec since 20:41:31, monitored, thread 27879 > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > reason 'waiting for LX lock' > Waiting 0.0009 sec since 20:41:31, monitored, thread 26553 > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > reason 'waiting for LX lock' > Waiting 0.0009 sec since 20:41:31, monitored, thread 25334 > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > reason 'waiting for LX lock' > Waiting 0.0009 sec since 20:41:31, monitored, thread 25337 > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > reason 'waiting for LX lock' > > and I'm wondering if there's anything immediate one would suggest to > help with that. > > -Aaron > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From stefan.dietrich at desy.de Mon Sep 11 08:40:14 2017 From: stefan.dietrich at desy.de (Dietrich, Stefan) Date: Mon, 11 Sep 2017 09:40:14 +0200 (CEST) Subject: [gpfsug-discuss] Switch from IPoIB connected mode to datagram with ESS 5.2.0? Message-ID: <743361352.9211728.1505115614463.JavaMail.zimbra@desy.de> Hello, during reading the upgrade docs for ESS 5.2.0, I noticed a change in the IPoIB mode. Now it specifies, that datagram (CONNECTED_MODE=no) instead of connected mode should be used. All earlier versions used connected mode. I am wondering about the reason for this change? Or is this only relevant for bonded IPoIB interfaces? Regards, Stefan -- ------------------------------------------------------------------------ Stefan Dietrich Deutsches Elektronen-Synchrotron (IT-Systems) Ein Forschungszentrum der Helmholtz-Gemeinschaft Notkestr. 85 phone: +49-40-8998-4696 22607 Hamburg e-mail: stefan.dietrich at desy.de Germany ------------------------------------------------------------------------ From john.hearns at asml.com Mon Sep 11 08:41:54 2017 From: john.hearns at asml.com (John Hearns) Date: Mon, 11 Sep 2017 07:41:54 +0000 Subject: [gpfsug-discuss] Deletion of a pcache snapshot? In-Reply-To: References: <950421843.15715748.1504812216286.JavaMail.zimbra@desy.de> Message-ID: Thankyou all for advice. The ?-p? option was the fix here (thankyou to IBM support). From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of McLaughlin, Sandra M Sent: Friday, September 08, 2017 11:12 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? John, I had a period when I had to delete and remake AFM filesets rather frequently ? this always worked for me: mmunlinkfileset device fset -f mmdelsnapshot device snapname -j fset mmdelfileset device fset -f Sandra From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of John Hearns Sent: 08 September 2017 08:26 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? Venkat, thankyou. I have a support ticket open on this issue, and will keep this option handy! From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Venkateswara R Puvvada Sent: Friday, September 08, 2017 7:01 AM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? Snapshots created by AFM (recovery, resync or peer snapshots) cannot be deleted by user using mmdelsnapshot command directly. After recovery or resync completion they get deleted automatically. For peer snapshots deletion mmpsnasp command is used. Which version of GPFS? Try with -p (undocumented) option. mmdelsnapshot device snapname -j fset -p ~Venkat (vpuvvada at in.ibm.com) From: "Malka, Janusz" > To: gpfsug main discussion list > Date: 09/08/2017 12:53 AM Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I had similar issue, I had to recover connection to home ________________________________ From: "John Hearns" > To: "gpfsug main discussion list" > Sent: Thursday, 7 September, 2017 17:52:19 Subject: [gpfsug-discuss] Deletion of a pcache snapshot? Firmly lining myself up for a smack round the chops with a wet haddock? I try to delete an AFM cache fileset which I create da few days ago (it has an NFS home) Mmdelfileset responds that : Fileset obfuscated has 1 fileset snapshot(s). When I try to delete the snapshot: Snapshot obfuscated.afm.1234 is an internal pcache recovery snapshot and cannot be deleted by user. I find this reference, which is about as useful as a wet haddock: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm The advice of the gallery is sought, please. -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=VX-Y-5GtodMzrpt1D71-1OOY3hu2UTJBg45sqxTHC8I&s=AmQf6iZeaanuNkB3Ys2lR8Ajk-n2TUJ6Wbt2z2pnbKI&e= -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. ________________________________ AstraZeneca UK Limited is a company incorporated in England and Wales with registered number:03674842 and its registered office at 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA. This e-mail and its attachments are intended for the above named recipient only and may contain confidential and privileged information. If they have come to you in error, you must not copy or show them to anyone; instead, please reply to this e-mail, highlighting the error to the sender and then immediately delete the message. For information about how AstraZeneca UK Limited and its affiliates may process information, personal data and monitor communications, please see our privacy notice at www.astrazeneca.com -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Mon Sep 11 09:11:15 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Mon, 11 Sep 2017 10:11:15 +0200 Subject: [gpfsug-discuss] tuning parameters question In-Reply-To: <25e8995a-7b5b-05d3-016a-4941fb75dcd6@nasa.gov> References: <21830c8a-4a00-c75a-76aa-772c261c00eb@nasa.gov> <25e8995a-7b5b-05d3-016a-4941fb75dcd6@nasa.gov> Message-ID: An HTML attachment was scrubbed... URL: From ed.swindelles at uconn.edu Mon Sep 11 16:49:15 2017 From: ed.swindelles at uconn.edu (Swindelles, Ed) Date: Mon, 11 Sep 2017 15:49:15 +0000 Subject: [gpfsug-discuss] UConn hiring GPFS administrator Message-ID: The University of Connecticut is hiring three full time, permanent technical positions for its HPC team on the Storrs campus. One of these positions is focused on storage administration, including a GPFS cluster. I would greatly appreciate it if you would forward this announcement to contacts of yours who may have an interest in these positions. Here are direct links to the job descriptions and applications: HPC Storage Administrator http://s.uconn.edu/3tx HPC Systems Administrator (2 positions to be filled) http://s.uconn.edu/3tw Thank you, -- Ed Swindelles Team Lead for Research Technology University of Connecticut 860-486-4522 -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Mon Sep 11 23:15:10 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Mon, 11 Sep 2017 18:15:10 -0400 Subject: [gpfsug-discuss] tuning parameters question In-Reply-To: References: <21830c8a-4a00-c75a-76aa-772c261c00eb@nasa.gov> <25e8995a-7b5b-05d3-016a-4941fb75dcd6@nasa.gov> Message-ID: <9de64193-c60c-8ee1-b681-6cfe3993772b@nasa.gov> Thanks, Olaf. I ended up un-setting a bunch of settings that are now auto-tuned (worker1threads, worker3threads, etc.) and just set workerthreads as you suggest. That combined with increasing maxfilestocache to above the max concurrent open file threshold of the workload got me consistently with in 1%-3% of the performance of the same storage hardware running btrfs instead of GPFS. I think that's pretty darned good considering the additional complexity GPFS has over btrfs of being a clustered filesystem. Plus I now get NFS server failover for very little effort and without having to deal with corosync or pacemaker. -Aaron On 9/11/17 4:11 AM, Olaf Weiser wrote: > Hi Aaron , > > 0,0009 s response time for your meta data IO ... seems to be a very > good/fast storage BE.. which is hard to improve.. > you can raise the parallelism a bit for accessing metadata , but if this > will help to improve your "workload" is not assured > > The worker3threads parameter specifies the number of threads to use for > inode prefetch. Usually , I would suggest, that you should not touch > single parameters any longer. By the great improvements of the last few > releases.. GPFS can calculate / retrieve the right settings > semi-automatically... > You only need to set simpler "workerThreads" .. > > But in your case , you can see, if this more specific value will help > you out . > > depending on your blocksize and average filesize .. you may see > additional improvements when tuning nfsPrefetchStrategy , which tells > GPFS to consider all IOs wihtin */N/* blockboundaries as sequential ?and > starts prefetch > > l.b.n.t. set ignoreprefetchLunCount to yes .. (if not already done) . > this helps GPFS to use all available workerThreads > > cheers > olaf > > > > From: Aaron Knister > To: > Date: 09/11/2017 02:50 AM > Subject: Re: [gpfsug-discuss] tuning parameters question > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------------------------------------------------ > > > > As an aside, my initial attempt was to use Ganesha via CES but the > performance was significantly worse than CNFS for this workload. The > docs seem to suggest that CNFS performs better for metadata intensive > workloads which certainly seems to fit the bill here. > > -Aaron > > On 9/10/17 8:43 PM, Aaron Knister wrote: > > Hi All (but mostly Sven), > > > > I stumbled across this great gem: > > > > files.gpfsug.org/presentations/2014/UG10_GPFS_Performance_Session_v10.pdf > > > > and I'm wondering which, if any, of those tuning parameters are still > > relevant with the 4.2.3 code. Specifically for a CNFS cluster. I'm > > exporting a gpfs fs as an NFS root to 1k nodes. The boot storm is > > particularly ugly and the storage doesn't appear to be bottlenecked. > > > > I see a lot of waiters like these: > > > > Waiting 0.0009 sec since 20:41:31, monitored, thread 2881 > > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > > reason 'waiting for LX lock' > > Waiting 0.0009 sec since 20:41:31, monitored, thread 26231 > > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > > reason 'waiting for LX lock' > > Waiting 0.0009 sec since 20:41:31, monitored, thread 26146 > > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > > reason 'waiting for LX lock' > > Waiting 0.0009 sec since 20:41:31, monitored, thread 18637 > > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > > reason 'waiting for LX lock' > > Waiting 0.0009 sec since 20:41:31, monitored, thread 25013 > > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > > reason 'waiting for LX lock' > > Waiting 0.0009 sec since 20:41:31, monitored, thread 27879 > > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > > reason 'waiting for LX lock' > > Waiting 0.0009 sec since 20:41:31, monitored, thread 26553 > > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > > reason 'waiting for LX lock' > > Waiting 0.0009 sec since 20:41:31, monitored, thread 25334 > > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > > reason 'waiting for LX lock' > > Waiting 0.0009 sec since 20:41:31, monitored, thread 25337 > > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > > reason 'waiting for LX lock' > > > > and I'm wondering if there's anything immediate one would suggest to > > help with that. > > > > -Aaron > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From zacekm at img.cas.cz Tue Sep 12 10:40:35 2017 From: zacekm at img.cas.cz (Michal Zacek) Date: Tue, 12 Sep 2017 11:40:35 +0200 Subject: [gpfsug-discuss] Wrong nodename after server restart Message-ID: <7e565699-b583-eeeb-c6b9-d11a39206331@img.cas.cz> Hi, I had to restart two of my gpfs servers (gpfs-n4 and gpfs-quorum) and after that I was unable to move CES IP address back with strange error "mmces address move: GPFS is down on this node". After I double checked that gpfs state is active on all nodes, I dug deeper and I think I found problem, but I don't really know how this could happen. Look at the names of nodes: [root at gpfs-n2 ~]# mmlscluster # Looks good GPFS cluster information ======================== GPFS cluster name: gpfscl1.img.local GPFS cluster id: 17792677515884116443 GPFS UID domain: img.local Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation ---------------------------------------------------------------------------------- 1 gpfs-n4.img.local 192.168.20.64 gpfs-n4.img.local quorum-manager 2 gpfs-quorum.img.local 192.168.20.60 gpfs-quorum.img.local quorum 3 gpfs-n3.img.local 192.168.20.63 gpfs-n3.img.local quorum-manager 4 tau.img.local 192.168.1.248 tau.img.local 5 gpfs-n1.img.local 192.168.20.61 gpfs-n1.img.local quorum-manager 6 gpfs-n2.img.local 192.168.20.62 gpfs-n2.img.local quorum-manager 8 whale.img.cas.cz 147.231.150.108 whale.img.cas.cz [root at gpfs-n2 ~]# mmlsmount gpfs01 -L # not so good File system gpfs01 is mounted on 7 nodes: 192.168.20.63 gpfs-n3 192.168.20.61 gpfs-n1 192.168.20.62 gpfs-n2 192.168.1.248 tau 192.168.20.64 gpfs-n4.img.local 192.168.20.60 gpfs-quorum.img.local 147.231.150.108 whale.img.cas.cz [root at gpfs-n2 ~]# tsctl shownodes up | tr ',' '\n' # very wrong whale.img.cas.cz.img.local tau.img.local gpfs-quorum.img.local.img.local gpfs-n1.img.local gpfs-n2.img.local gpfs-n3.img.local gpfs-n4.img.local.img.local The "tsctl shownodes up" is the reason why I'm not able to move CES address back to gpfs-n4 node, but the real problem are different nodenames. I think OS is configured correctly: [root at gpfs-n4 /]# hostname gpfs-n4 [root at gpfs-n4 /]# hostname -f gpfs-n4.img.local [root at gpfs-n4 /]# cat /etc/resolv.conf nameserver 192.168.20.30 nameserver 147.231.150.2 search img.local domain img.local [root at gpfs-n4 /]# cat /etc/hosts | grep gpfs-n4 192.168.20.64 gpfs-n4.img.local gpfs-n4 [root at gpfs-n4 /]# host gpfs-n4 gpfs-n4.img.local has address 192.168.20.64 [root at gpfs-n4 /]# host 192.168.20.64 64.20.168.192.in-addr.arpa domain name pointer gpfs-n4.img.local. Can someone help me with this. Thanks, Michal p.s. gpfs version: 4.2.3-2 (CentOS 7) From secretary at gpfsug.org Tue Sep 12 15:22:41 2017 From: secretary at gpfsug.org (Secretary GPFS UG) Date: Tue, 12 Sep 2017 15:22:41 +0100 Subject: [gpfsug-discuss] SS UG UK 2018 Message-ID: Dear all, A date for your diary, #SSUG18 in the UK will be taking place on April 18th & 19th 2018. Please mark it in your diaries now! We'll confirm other details (venue, agenda etc.) nearer the time, but the date is confirmed. Thanks, -- Claire O'Toole Spectrum Scale/GPFS User Group Secretary +44 (0)7508 033896 www.spectrumscaleug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Tue Sep 12 16:01:21 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Tue, 12 Sep 2017 11:01:21 -0400 Subject: [gpfsug-discuss] Wrong nodename after server restart In-Reply-To: <7e565699-b583-eeeb-c6b9-d11a39206331@img.cas.cz> References: <7e565699-b583-eeeb-c6b9-d11a39206331@img.cas.cz> Message-ID: Michal, When a node is added to a cluster that has a different domain than the rest of the nodes in the cluster, the GPFS daemons running on the various nodes can develop an inconsistent understanding of what the common suffix of all the domain names are. The symptoms you show with the "tsctl shownodes up" output, and in particular the incorrect node names of the two nodes you restarted, as seen on a node you did not restart, are consistent with this problem. I also note your cluster appears to have the necessary pre-condition to trip on this problem, whale.img.cas.cz does not share a common suffix with the other nodes in the cluster. The common suffix of the other nodes in the cluster is ".img.local". Was whale.img.cas.cz recently added to the cluster? Unfortunately, the general work-around is to recycle all the nodes at once: mmshutdown -a, followed by mmstartup -a. I hope this helps. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Michal Zacek To: gpfsug-discuss at spectrumscale.org Date: 09/12/2017 05:41 AM Subject: [gpfsug-discuss] Wrong nodename after server restart Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, I had to restart two of my gpfs servers (gpfs-n4 and gpfs-quorum) and after that I was unable to move CES IP address back with strange error "mmces address move: GPFS is down on this node". After I double checked that gpfs state is active on all nodes, I dug deeper and I think I found problem, but I don't really know how this could happen. Look at the names of nodes: [root at gpfs-n2 ~]# mmlscluster # Looks good GPFS cluster information ======================== GPFS cluster name: gpfscl1.img.local GPFS cluster id: 17792677515884116443 GPFS UID domain: img.local Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation ---------------------------------------------------------------------------------- 1 gpfs-n4.img.local 192.168.20.64 gpfs-n4.img.local quorum-manager 2 gpfs-quorum.img.local 192.168.20.60 gpfs-quorum.img.local quorum 3 gpfs-n3.img.local 192.168.20.63 gpfs-n3.img.local quorum-manager 4 tau.img.local 192.168.1.248 tau.img.local 5 gpfs-n1.img.local 192.168.20.61 gpfs-n1.img.local quorum-manager 6 gpfs-n2.img.local 192.168.20.62 gpfs-n2.img.local quorum-manager 8 whale.img.cas.cz 147.231.150.108 whale.img.cas.cz [root at gpfs-n2 ~]# mmlsmount gpfs01 -L # not so good File system gpfs01 is mounted on 7 nodes: 192.168.20.63 gpfs-n3 192.168.20.61 gpfs-n1 192.168.20.62 gpfs-n2 192.168.1.248 tau 192.168.20.64 gpfs-n4.img.local 192.168.20.60 gpfs-quorum.img.local 147.231.150.108 whale.img.cas.cz [root at gpfs-n2 ~]# tsctl shownodes up | tr ',' '\n' # very wrong whale.img.cas.cz.img.local tau.img.local gpfs-quorum.img.local.img.local gpfs-n1.img.local gpfs-n2.img.local gpfs-n3.img.local gpfs-n4.img.local.img.local The "tsctl shownodes up" is the reason why I'm not able to move CES address back to gpfs-n4 node, but the real problem are different nodenames. I think OS is configured correctly: [root at gpfs-n4 /]# hostname gpfs-n4 [root at gpfs-n4 /]# hostname -f gpfs-n4.img.local [root at gpfs-n4 /]# cat /etc/resolv.conf nameserver 192.168.20.30 nameserver 147.231.150.2 search img.local domain img.local [root at gpfs-n4 /]# cat /etc/hosts | grep gpfs-n4 192.168.20.64 gpfs-n4.img.local gpfs-n4 [root at gpfs-n4 /]# host gpfs-n4 gpfs-n4.img.local has address 192.168.20.64 [root at gpfs-n4 /]# host 192.168.20.64 64.20.168.192.in-addr.arpa domain name pointer gpfs-n4.img.local. Can someone help me with this. Thanks, Michal p.s. gpfs version: 4.2.3-2 (CentOS 7) _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=l_sz-tPolX87WmSf2zBhhPpggnfQJKp7-BqV8euBp7A&s=XSPGkKRMza8PhYQg8AxeKW9cOTNeCI9uph486_6Xajo&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.kidger at uk.ibm.com Tue Sep 12 16:36:06 2017 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Tue, 12 Sep 2017 15:36:06 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 68, Issue 13 In-Reply-To: Message-ID: Well George is not the only one to have replied to the list with a one to one message. ? Remember folks, this mailing list has a *lot* of people on it. Hope my message is last that forgets who is in the 'To' field. Daniel Daniel Kidger Technical Sales Specialist, IBM UK IBM Spectrum Storage Software daniel.kidger at uk.ibm.com +44 (0)7818 522266 > On 8 Sep 2017, at 19:30, Ken Atkinson wrote: > > Not on too many G&Ts Georgina? > How are things. > Ken Atkinson > > On 8 Sep 2017 08:33, "Georgina Ellis" wrote: > Apologies All, slip of the keyboard and not a comment on GPFS! > > Sent from my iPhone > > > On 7 Sep 2017, at 22:16, "gpfsug-discuss-request at spectrumscale.org" wrote: > > > > Send gpfsug-discuss mailing list submissions to > > gpfsug-discuss at spectrumscale.org > > > > To subscribe or unsubscribe via the World Wide Web, visit > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > or, via email, send a message with subject or body 'help' to > > gpfsug-discuss-request at spectrumscale.org > > > > You can reach the person managing the list at > > gpfsug-discuss-owner at spectrumscale.org > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of gpfsug-discuss digest..." > > > > > > Today's Topics: > > > > 1. Re: Deletion of a pcache snapshot? (Malka, Janusz) > > 2. Re: SMB2 leases - oplocks - growing files (Christof Schmitt) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Thu, 7 Sep 2017 21:23:36 +0200 (CEST) > > From: "Malka, Janusz" > > To: gpfsug main discussion list > > Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? > > Message-ID: <950421843.15715748.1504812216286.JavaMail.zimbra at desy.de> > > Content-Type: text/plain; charset="utf-8" > > > > I had similar issue, I had to recover connection to home > > > > > > From: "John Hearns" > > To: "gpfsug main discussion list" > > Sent: Thursday, 7 September, 2017 17:52:19 > > Subject: [gpfsug-discuss] Deletion of a pcache snapshot? > > > > > > > > Firmly lining myself up for a smack round the chops with a wet haddock? > > > > I try to delete an AFM cache fileset which I create da few days ago (it has an NFS home) > > > > Mmdelfileset responds that : > > > > Fileset obfuscated has 1 fileset snapshot(s). > > > > > > > > When I try to delete the snapshot: > > > > Snapshot obfuscated.afm.1234 is an internal pcache recovery snapshot and cannot be deleted by user. > > > > > > > > I find this reference, which is about as useful as a wet haddock: > > > > [ https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm | https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm ] > > > > > > > > The advice of the gallery is sought, please. > > > > > > > > > > > > > > -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > URL: > > > > ------------------------------ > > > > Message: 2 > > Date: Thu, 7 Sep 2017 21:16:34 +0000 > > From: "Christof Schmitt" > > To: gpfsug-discuss at spectrumscale.org > > Cc: gpfsug-discuss at spectrumscale.org > > Subject: Re: [gpfsug-discuss] SMB2 leases - oplocks - growing files > > Message-ID: > > > > > > Content-Type: text/plain; charset="us-ascii" > > > > An HTML attachment was scrubbed... > > URL: > > > > ------------------------------ > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > End of gpfsug-discuss Digest, Vol 68, Issue 13 > > ********************************************** > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.mills at nasa.gov Tue Sep 12 17:06:23 2017 From: jonathan.mills at nasa.gov (Jonathan Mills) Date: Tue, 12 Sep 2017 12:06:23 -0400 (EDT) Subject: [gpfsug-discuss] Support for SLES 12 SP3 Message-ID: SLES 12 SP3 has been released. And for what it?s worth, there does not appear to be substantial changes in either kernel or glibc as compared to SLES 12 SP2. In fact, the latest SLES 12 SP2 kernel is ?4.4.74-92.29?, while the initial SLES 12 SP3 kernel is ?4.4.73-5.1?. Given this, I wanted to ask the team at IBM: 1) have you begun looking into SLES 12 SP3 yet? 2) if so, do you have any idea when you might release a fully supported version of Spectrum Scale for SLES 12 SP3? Those of us who run SLES and are looking to deploy new infrastructure this fall would prefer to do so on the latest rev of our OS, as opposed to one that is already on life support... -- Jonathan Mills / jonathan.mills at nasa.gov NASA GSFC / NCCS HPC (606.2) Bldg 28, Rm. S230 / c. 252-412-5710 From Greg.Lehmann at csiro.au Wed Sep 13 00:12:55 2017 From: Greg.Lehmann at csiro.au (Greg.Lehmann at csiro.au) Date: Tue, 12 Sep 2017 23:12:55 +0000 Subject: [gpfsug-discuss] Support for SLES 12 SP3 In-Reply-To: References: Message-ID: <67f390a558244c41b154a7a6a9e5efe8@exch1-cdc.nexus.csiro.au> +1. We are interested in SLES 12 SP3 too. BTW had anybody done any comparisons of SLES 12 SP2 (4.4) kernel vs RHEL 7.3 in terms of GPFS IO performance? I would think the 4.4 kernel might give it an edge. I'll probably get around to comparing them myself one day, but if anyone else has some numbers... -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jonathan Mills Sent: Wednesday, 13 September 2017 2:06 AM To: gpfsug main discussion list Subject: [gpfsug-discuss] Support for SLES 12 SP3 SLES 12 SP3 has been released. And for what it?s worth, there does not appear to be substantial changes in either kernel or glibc as compared to SLES 12 SP2. In fact, the latest SLES 12 SP2 kernel is ?4.4.74-92.29?, while the initial SLES 12 SP3 kernel is ?4.4.73-5.1?. Given this, I wanted to ask the team at IBM: 1) have you begun looking into SLES 12 SP3 yet? 2) if so, do you have any idea when you might release a fully supported version of Spectrum Scale for SLES 12 SP3? Those of us who run SLES and are looking to deploy new infrastructure this fall would prefer to do so on the latest rev of our OS, as opposed to one that is already on life support... -- Jonathan Mills / jonathan.mills at nasa.gov NASA GSFC / NCCS HPC (606.2) Bldg 28, Rm. S230 / c. 252-412-5710 From scale at us.ibm.com Wed Sep 13 22:33:30 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Wed, 13 Sep 2017 17:33:30 -0400 Subject: [gpfsug-discuss] Fw: Wrong nodename after server restart Message-ID: ----- Forwarded by Eric Agar/Poughkeepsie/IBM on 09/13/2017 05:32 PM ----- From: IBM Spectrum Scale/Poughkeepsie/IBM To: Michal Zacek Date: 09/13/2017 05:29 PM Subject: Re: [gpfsug-discuss] Wrong nodename after server restart Sent by: Eric Agar Hello Michal, It should not be necessary to delete whale.img.cas.cz and rename it. But, that is an option you can take, if you prefer it. If you decide to take that option, please see the last paragraph of this response. The confusion starts at the moment a node is added to the active cluster where the new node does not have the same common domain suffix as the nodes that were already in the cluster. The confusion increases when the GPFS daemons on some nodes, but not all nodes, are recycled. Doing mmshutdown -a, followed by mmstartup -a, once after the new node has been added allows all GPFS daemons on all nodes to come up at the same time and arrive at the same answer to the question, "what is the common domain suffix for all the nodes in the cluster now?" In the case of your cluster, the answer will be "the common domain suffix is the empty string" or, put another way, "there is no common domain suffix"; that is okay, as long as all the GPFS daemons come to the same conclusion. After you recycle the cluster, you can check to make sure all seems well by running "tsctl shownodes up" on every node, and make sure the answer is correct on each node. If the mmshutdown -a / mmstartup -a recycle works, the problem should not recur with the current set of nodes in the cluster. Even as individual GPFS daemons are recycled going forward, they should still understand the cluster's nodes have no common domain suffix. However, I can imagine sequences of events that would cause the issue to occur again after nodes are deleted or added to the cluster while the cluster is active. For example, if whale.img.cas.cz were to be deleted from the current cluster, that action would restore the cluster to having a common domain suffix of ".img.local", but already running GPFS daemons would not realize it. If the delete of whale occurred while the cluster was active, subsequent recycling of the GPFS daemon on just a subset of the nodes would cause the recycled daemons to understand the common domain suffix to now be ".img.local". But, daemons that had not been recycled would still think there is no common domain suffix. The confusion would occur again. On the other hand, adding and deleting nodes to/from the cluster should not cause the issue to occur again as long as the cluster continues to have the same (in this case, no) common domain suffix. If you decide to delete whale.img.case.cz, rename it to have the ".img.local" domain suffix, and add it back to the cluster, it would be best to do so after all the GPFS daemons are shut down with mmshutdown -a, but before any of the daemons are restarted with mmstartup. This would allow all the subsequent running daemons to come to the conclusion that ".img.local" is now the common domain suffix. I hope this helps. Regards, Eric Agar Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Michal Zacek To: IBM Spectrum Scale Date: 09/13/2017 03:42 AM Subject: Re: [gpfsug-discuss] Wrong nodename after server restart Hello yes you are correct, Whale was added two days a go. It's necessary to delete whale.img.cas.cz from cluster before mmshutdown/mmstartup? If the two domains may cause problems in the future I can rename whale (and all planed nodes) to img.local suffix. Many thanks for the prompt reply. Regards Michal Dne 12.9.2017 v 17:01 IBM Spectrum Scale napsal(a): Michal, When a node is added to a cluster that has a different domain than the rest of the nodes in the cluster, the GPFS daemons running on the various nodes can develop an inconsistent understanding of what the common suffix of all the domain names are. The symptoms you show with the "tsctl shownodes up" output, and in particular the incorrect node names of the two nodes you restarted, as seen on a node you did not restart, are consistent with this problem. I also note your cluster appears to have the necessary pre-condition to trip on this problem, whale.img.cas.cz does not share a common suffix with the other nodes in the cluster. The common suffix of the other nodes in the cluster is ".img.local". Was whale.img.cas.cz recently added to the cluster? Unfortunately, the general work-around is to recycle all the nodes at once: mmshutdown -a, followed by mmstartup -a. I hope this helps. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Michal Zacek To: gpfsug-discuss at spectrumscale.org Date: 09/12/2017 05:41 AM Subject: [gpfsug-discuss] Wrong nodename after server restart Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, I had to restart two of my gpfs servers (gpfs-n4 and gpfs-quorum) and after that I was unable to move CES IP address back with strange error "mmces address move: GPFS is down on this node". After I double checked that gpfs state is active on all nodes, I dug deeper and I think I found problem, but I don't really know how this could happen. Look at the names of nodes: [root at gpfs-n2 ~]# mmlscluster # Looks good GPFS cluster information ======================== GPFS cluster name: gpfscl1.img.local GPFS cluster id: 17792677515884116443 GPFS UID domain: img.local Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation ---------------------------------------------------------------------------------- 1 gpfs-n4.img.local 192.168.20.64 gpfs-n4.img.local quorum-manager 2 gpfs-quorum.img.local 192.168.20.60 gpfs-quorum.img.local quorum 3 gpfs-n3.img.local 192.168.20.63 gpfs-n3.img.local quorum-manager 4 tau.img.local 192.168.1.248 tau.img.local 5 gpfs-n1.img.local 192.168.20.61 gpfs-n1.img.local quorum-manager 6 gpfs-n2.img.local 192.168.20.62 gpfs-n2.img.local quorum-manager 8 whale.img.cas.cz 147.231.150.108 whale.img.cas.cz [root at gpfs-n2 ~]# mmlsmount gpfs01 -L # not so good File system gpfs01 is mounted on 7 nodes: 192.168.20.63 gpfs-n3 192.168.20.61 gpfs-n1 192.168.20.62 gpfs-n2 192.168.1.248 tau 192.168.20.64 gpfs-n4.img.local 192.168.20.60 gpfs-quorum.img.local 147.231.150.108 whale.img.cas.cz [root at gpfs-n2 ~]# tsctl shownodes up | tr ',' '\n' # very wrong whale.img.cas.cz.img.local tau.img.local gpfs-quorum.img.local.img.local gpfs-n1.img.local gpfs-n2.img.local gpfs-n3.img.local gpfs-n4.img.local.img.local The "tsctl shownodes up" is the reason why I'm not able to move CES address back to gpfs-n4 node, but the real problem are different nodenames. I think OS is configured correctly: [root at gpfs-n4 /]# hostname gpfs-n4 [root at gpfs-n4 /]# hostname -f gpfs-n4.img.local [root at gpfs-n4 /]# cat /etc/resolv.conf nameserver 192.168.20.30 nameserver 147.231.150.2 search img.local domain img.local [root at gpfs-n4 /]# cat /etc/hosts | grep gpfs-n4 192.168.20.64 gpfs-n4.img.local gpfs-n4 [root at gpfs-n4 /]# host gpfs-n4 gpfs-n4.img.local has address 192.168.20.64 [root at gpfs-n4 /]# host 192.168.20.64 64.20.168.192.in-addr.arpa domain name pointer gpfs-n4.img.local. Can someone help me with this. Thanks, Michal p.s. gpfs version: 4.2.3-2 (CentOS 7) _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=l_sz-tPolX87WmSf2zBhhPpggnfQJKp7-BqV8euBp7A&s=XSPGkKRMza8PhYQg8AxeKW9cOTNeCI9uph486_6Xajo&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Michal ???ek | Information Technologies +420 296 443 128 +420 296 443 333 michal.zacek at img.cas.cz www.img.cas.cz Institute of Molecular Genetics of the ASCR, v. v. i., V?de?sk? 1083, 142 20 Prague 4, Czech Republic ID: 68378050 | VAT ID: CZ68378050 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1997 bytes Desc: not available URL: From valdis.kletnieks at vt.edu Thu Sep 14 01:18:51 2017 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Wed, 13 Sep 2017 20:18:51 -0400 Subject: [gpfsug-discuss] mmapplypolicy run time weirdness.. Message-ID: <52657.1505348331@turing-police.cc.vt.edu> So we have a number of very similar policy files that get applied for file migration etc. And they vary drastically in the runtime to process, apparently due to different selections on whether to do the work in parallel. Running a set of rules with 'mmapplypolicy -I defer' that look like this: RULE 'VBI_FILES_RULE' MIGRATE FROM POOL 'system' THRESHOLD(0,100,0) WEIGHT(FILE_SIZE) TO POOL 'VBI_FILES' FOR FILESET('vbi') WHERE (mb_allocated >= 8) for 10 filesets can scan 325M directory entries in 6 minutes, and sort and evaluate the policy in 3 more minutes. However, this takes a bit over 30 minutes for the scan and another 20 for sorting and policy evaluation over the same set of filesets: RULE 'VBI_FILES_RULE' LIST 'pruned_files' THRESHOLD(90,80) WEIGHT(FILE_SIZE) FOR FILESET('vbi') WHERE (mb_allocated >= 8) even though the output is essentially identical. Why is LIST so much more expensive than 'MIGRATE" with '-I defer'? I could understand if I had an expensive SHOW clause, but there isn't one here (and a different policy that I run that *does* have a big SHOW clause takes almost the same amount of time as the minimal LIST).... I'm thinking that it has *something* to do with the MIGRATE job outputting: [I] 2017-09-12 at 21:20:44.155 Parallel-piped sort and policy evaluation. 0 files scanned. (...) [I] 2017-09-12 at 21:24:14.672 Piped sorting and candidate file choosing. 0 records scanned. while the LIST job says: [I] 2017-09-12 at 13:58:06.926 Sorting 327627521 file list records. (...) [I] 2017-09-12 at 14:02:04.223 Policy evaluation. 0 files scanned. (Both output the same message during the 'Directory entries scanned: 0.' phase, but I suspect MIGRATE is multi-threading that part as well, as it completes much faster). What's the controlling factor in mmapplypolicy's decision whether or not to parallelize the policy? From oehmes at gmail.com Thu Sep 14 01:28:46 2017 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 14 Sep 2017 00:28:46 +0000 Subject: [gpfsug-discuss] mmapplypolicy run time weirdness.. In-Reply-To: <52657.1505348331@turing-police.cc.vt.edu> References: <52657.1505348331@turing-police.cc.vt.edu> Message-ID: can you please share the entire command line you are using ? also gpfs version, mmlsconfig output would help as well as if this is a shared storage filesystem or a system using local disks. thx. Sven On Wed, Sep 13, 2017 at 5:19 PM wrote: > So we have a number of very similar policy files that get applied for file > migration etc. And they vary drastically in the runtime to process, > apparently > due to different selections on whether to do the work in parallel. > > Running a set of rules with 'mmapplypolicy -I defer' that look like this: > > RULE 'VBI_FILES_RULE' MIGRATE FROM POOL 'system' > THRESHOLD(0,100,0) > WEIGHT(FILE_SIZE) > TO POOL 'VBI_FILES' > FOR FILESET('vbi') > WHERE (mb_allocated >= 8) > > for 10 filesets can scan 325M directory entries in 6 minutes, and sort and > evaluate the policy in 3 more minutes. > > However, this takes a bit over 30 minutes for the scan and another 20 for > sorting and policy evaluation over the same set of filesets: > > RULE 'VBI_FILES_RULE' LIST 'pruned_files' > THRESHOLD(90,80) > WEIGHT(FILE_SIZE) > FOR FILESET('vbi') > WHERE (mb_allocated >= 8) > > even though the output is essentially identical. Why is LIST so much more > expensive than 'MIGRATE" with '-I defer'? I could understand if I > had an > expensive SHOW clause, but there isn't one here (and a different policy > that I > run that *does* have a big SHOW clause takes almost the same amount of > time as > the minimal LIST).... > > I'm thinking that it has *something* to do with the MIGRATE job outputting: > > [I] 2017-09-12 at 21:20:44.155 Parallel-piped sort and policy evaluation. 0 > files scanned. > (...) > [I] 2017-09-12 at 21:24:14.672 Piped sorting and candidate file choosing. 0 > records scanned. > > while the LIST job says: > > [I] 2017-09-12 at 13:58:06.926 Sorting 327627521 file list records. > (...) > [I] 2017-09-12 at 14:02:04.223 Policy evaluation. 0 files scanned. > > (Both output the same message during the 'Directory entries scanned: 0.' > phase, but I suspect MIGRATE is multi-threading that part as well, as it > completes much faster). > > What's the controlling factor in mmapplypolicy's decision whether or > not to parallelize the policy? > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kh.atmane at gmail.com Thu Sep 14 13:49:55 2017 From: kh.atmane at gmail.com (atmane) Date: Thu, 14 Sep 2017 13:49:55 +0100 Subject: [gpfsug-discuss] Disk change problem in gss GNR Message-ID: dear all, I change A Disk In Gss Storage Server mmchcarrier BB1RGL --release --pdisk 'e1d1s02' mmchcarrier BB1RGL --replace --pdisk 'e1d1s02' after replace disk Now I Have 2 Discs In My Gss the first disc was well changed name = "e1d1s02" the second disk still after I use this cmd mmdelpdisk BB1RGL --pdisk e1d1s02#004 -a the disk is still in use i need to reboot the system or ?? mmlspdisk all | less pdisk: replacementPriority = 1000 name = "e1d1s02" device = "/dev/sdik,/dev/sdih" recoveryGroup = "BB1RGL" declusteredArray = "DA1" state = "ok" capacity = 3000034656256 freeSpace = 1453846429696 fru = "00W1572" location = "SV30820390-1-2" WWN = "naa.5000C5008D783E37" server = "gss0-ib0" pdisk: replacementPriority = 1000 name = "e1d1s02#004" device = "" recoveryGroup = "BB1RGL" declusteredArray = "DA1" state = "missing/noPath/systemDrain/adminDrain/noRGD/noVCD" capacity = 3000034656256 freeSpace = 1599875317760 fru = "00W1572" location = "" WWN = "naa.5000C50056714E83" server = "gss0-ib0" -- -- Atmane Khiredine HPC System Admin | Office National de la M?t?orologie T?l : +213 21 50 73 93 Poste 303 | Fax : +213 21 50 79 40 | E-mail : a.khiredine at meteo.dz From makaplan at us.ibm.com Thu Sep 14 19:55:39 2017 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 14 Sep 2017 14:55:39 -0400 Subject: [gpfsug-discuss] mmapplypolicy run time weirdness.. In-Reply-To: <52657.1505348331@turing-police.cc.vt.edu> References: <52657.1505348331@turing-police.cc.vt.edu> Message-ID: Read the doc again. Specify both -g and -N options on the command line to get fully parallel directory and inode/policy scanning. I'm curious as to what you're trying to do with THRESHOLD(0,100,0) ... Perhaps premigrate everything (that matches the other conditions)? You are correct about I] 2017-09-12 at 21:20:44.155 Parallel-piped sort and policy evaluation. 0 files scanned. (...) [I] 2017-09-12 at 21:24:14.672 Piped sorting and candidate file choosing. 0 records scanned. If you don't see messages like that, you did not specify both -N and -g. From: valdis.kletnieks at vt.edu To: gpfsug-discuss at spectrumscale.org Date: 09/13/2017 08:19 PM Subject: [gpfsug-discuss] mmapplypolicy run time weirdness.. Sent by: gpfsug-discuss-bounces at spectrumscale.org So we have a number of very similar policy files that get applied for file migration etc. And they vary drastically in the runtime to process, apparently due to different selections on whether to do the work in parallel. Running a set of rules with 'mmapplypolicy -I defer' that look like this: RULE 'VBI_FILES_RULE' MIGRATE FROM POOL 'system' THRESHOLD(0,100,0) WEIGHT(FILE_SIZE) TO POOL 'VBI_FILES' FOR FILESET('vbi') WHERE (mb_allocated >= 8) for 10 filesets can scan 325M directory entries in 6 minutes, and sort and evaluate the policy in 3 more minutes. However, this takes a bit over 30 minutes for the scan and another 20 for sorting and policy evaluation over the same set of filesets: RULE 'VBI_FILES_RULE' LIST 'pruned_files' THRESHOLD(90,80) WEIGHT(FILE_SIZE) FOR FILESET('vbi') WHERE (mb_allocated >= 8) even though the output is essentially identical. Why is LIST so much more expensive than 'MIGRATE" with '-I defer'? I could understand if I had an expensive SHOW clause, but there isn't one here (and a different policy that I run that *does* have a big SHOW clause takes almost the same amount of time as the minimal LIST).... I'm thinking that it has *something* to do with the MIGRATE job outputting: [I] 2017-09-12 at 21:20:44.155 Parallel-piped sort and policy evaluation. 0 files scanned. (...) [I] 2017-09-12 at 21:24:14.672 Piped sorting and candidate file choosing. 0 records scanned. while the LIST job says: [I] 2017-09-12 at 13:58:06.926 Sorting 327627521 file list records. (...) [I] 2017-09-12 at 14:02:04.223 Policy evaluation. 0 files scanned. (Both output the same message during the 'Directory entries scanned: 0.' phase, but I suspect MIGRATE is multi-threading that part as well, as it completes much faster). What's the controlling factor in mmapplypolicy's decision whether or not to parallelize the policy? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8&m=SGbwD3m5mZ16_vwIFK8Ym48lwdF1tVktnSao0a_tkfA&s=sLt9AtZiZ0qZCKzuQoQuyxN76_R66jfAwQxdIY-w2m0&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Thu Sep 14 21:09:40 2017 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Thu, 14 Sep 2017 16:09:40 -0400 Subject: [gpfsug-discuss] mmapplypolicy run time weirdness.. In-Reply-To: References: <52657.1505348331@turing-police.cc.vt.edu> Message-ID: <26551.1505419780@turing-police.cc.vt.edu> On Thu, 14 Sep 2017 14:55:39 -0400, "Marc A Kaplan" said: > Read the doc again. Specify both -g and -N options on the command line to > get fully parallel directory and inode/policy scanning. Yeah, figured that out, with help from somebody. :) > I'm curious as to what you're trying to do with THRESHOLD(0,100,0) ... > Perhaps premigrate everything (that matches the other conditions)? Yeah, it's actually feeding to LTFS/EE - where we premigrate everything that matches to tape. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From makaplan at us.ibm.com Thu Sep 14 22:13:59 2017 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 14 Sep 2017 17:13:59 -0400 Subject: [gpfsug-discuss] mmapplypolicy run time weirdness.. In-Reply-To: <26551.1505419780@turing-police.cc.vt.edu> References: <52657.1505348331@turing-police.cc.vt.edu> <26551.1505419780@turing-police.cc.vt.edu> Message-ID: BTW - we realize that mmapplypolicy -g and -N is a "gotcha" for some (many?) customer/admins -- so we're considering ways to make that easier -- but without "breaking" scripts and callbacks and what-have-yous that might depend on the current/old defaults... Always a balancing act -- considering that GPFS ne Spectrum Scale just hit its 20th birthday (by IBM reckoning) --marc of GPFS -------------- next part -------------- An HTML attachment was scrubbed... URL: From neil.wilson at metoffice.gov.uk Fri Sep 15 11:47:19 2017 From: neil.wilson at metoffice.gov.uk (Wilson, Neil) Date: Fri, 15 Sep 2017 10:47:19 +0000 Subject: [gpfsug-discuss] ZIMON Sensors config files... Message-ID: Hi, Does anyone know how to use "mmperfmon config update" to get the "hostname =" field in the ZImonSensors.cfg file populated with the hostname of the node that it's been installed on? By default the field is empty and for some reason on our cluster it doesn't transmit any metrics unless we put the node hostname into that field. Is there some kind of wildcard that I can set? Thanks Neil Neil Wilson Senior IT Practitioner Storage Team IT Services Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom Tel: +44 (0)1392 885959 Email: neil.wilson at metoffice.gov.uk Website www.metoffice.gov.uk Our magazine Barometer is now available online at http://www.metoffice.gov.uk/barometer/ P Please consider the environment before printing this e-mail. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.hearns at asml.com Fri Sep 15 16:37:13 2017 From: john.hearns at asml.com (John Hearns) Date: Fri, 15 Sep 2017 15:37:13 +0000 Subject: [gpfsug-discuss] NFS re-export with nfs-ganesha proxy? Message-ID: This is very probably off topic here.. I would be happy to get any responses off list. My question is has anyone here set up NFS re-export / proxy with nfs-ganesha? John Hearns -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Greg.Lehmann at csiro.au Mon Sep 18 01:14:52 2017 From: Greg.Lehmann at csiro.au (Greg.Lehmann at csiro.au) Date: Mon, 18 Sep 2017 00:14:52 +0000 Subject: [gpfsug-discuss] NFS re-export with nfs-ganesha proxy? In-Reply-To: References: Message-ID: <5d1811f4d6ad4605bd2a7c7441f4dd1b@exch1-cdc.nexus.csiro.au> I am interested too, so maybe keep it on list? From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of John Hearns Sent: Saturday, 16 September 2017 1:37 AM To: gpfsug main discussion list Subject: [gpfsug-discuss] NFS re-export with nfs-ganesha proxy? This is very probably off topic here.. I would be happy to get any responses off list. My question is has anyone here set up NFS re-export / proxy with nfs-ganesha? John Hearns -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.lefebvre+gpfsug at calculquebec.ca Mon Sep 18 20:16:57 2017 From: richard.lefebvre+gpfsug at calculquebec.ca (Richard Lefebvre) Date: Mon, 18 Sep 2017 15:16:57 -0400 Subject: [gpfsug-discuss] How to find which node is generating high iops in a GPFS 3.5 Message-ID: Hi I have a 3.5 GPFS system with 700+ nodes. I sometime have nodes that generate a lot of iops on the large file system but I cannot find the right tool to find which node is the source. I'm guessing under 4.2.X, there are now easy tools, but what can be done under GPFS 3.5. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Mon Sep 18 20:27:49 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 18 Sep 2017 19:27:49 +0000 Subject: [gpfsug-discuss] How to find which node is generating high iops in a GPFS 3.5 Message-ID: <39FB5D56-A8C4-47DA-8A56-A2E453724875@nuance.com> You do realize 3.5 is out of service, correct? You should be looking at upgrading :-) Catching this is real time, when you have a large number of nodes is going to be tough. How you recognizing that the file system is overloaded? Waiters? Looking at which nodes/NSDs have the longest/largest waiters may provide a clue. You might also take a look at mmpmon ? it?s a bit difficult to use in its raw state, but it does provide some good stats on a per file system basis. But you need to track these over times to get what you need. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Richard Lefebvre Reply-To: gpfsug main discussion list Date: Monday, September 18, 2017 at 2:18 PM To: gpfsug Subject: [EXTERNAL] [gpfsug-discuss] How to find which node is generating high iops in a GPFS 3.5 Hi I have a 3.5 GPFS system with 700+ nodes. I sometime have nodes that generate a lot of iops on the large file system but I cannot find the right tool to find which node is the source. I'm guessing under 4.2.X, there are now easy tools, but what can be done under GPFS 3.5. -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Tue Sep 19 07:47:42 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Tue, 19 Sep 2017 14:47:42 +0800 Subject: [gpfsug-discuss] ZIMON Sensors config files... In-Reply-To: References: Message-ID: Hi Neil, Have you tried these steps? mmperfmon config show --config-file /tmp/a vi /tmp/a mmperfmon config update --collectors oc8757286465 --config-file /tmp/a mmperfmon config show Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Wilson, Neil" To: gpfsug main discussion list Date: 09/15/2017 06:48 PM Subject: [gpfsug-discuss] ZIMON Sensors config files... Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, Does anyone know how to use ?mmperfmon config update? to get the ?hostname =? field in the ZImonSensors.cfg file populated with the hostname of the node that it?s been installed on? By default the field is empty and for some reason on our cluster it doesn?t transmit any metrics unless we put the node hostname into that field. Is there some kind of wildcard that I can set? Thanks Neil Neil Wilson Senior IT Practitioner Storage Team IT Services Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom Tel: +44 (0)1392 885959 Email: neil.wilson at metoffice.gov.uk Website www.metoffice.gov.uk Our magazine Barometer is now available online at http://www.metoffice.gov.uk/barometer/ P Please consider the environment before printing this e-mail. Thank you. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=JJA1q39zaRyjClihY50646c-CyY4ZvrmpSjR1qs5rTc&s=GWOiCpEHiZ_TqlFj0AeKmjcccnez-X2rHMa5UtvGPTk&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From scale at us.ibm.com Tue Sep 19 07:54:50 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Tue, 19 Sep 2017 14:54:50 +0800 Subject: [gpfsug-discuss] How to find which node is generating high iops in a GPFS 3.5 In-Reply-To: <39FB5D56-A8C4-47DA-8A56-A2E453724875@nuance.com> References: <39FB5D56-A8C4-47DA-8A56-A2E453724875@nuance.com> Message-ID: Hi Richard, Is any of tool in https://www.ibm.com/developerworks/community/wikis/home?_escaped_fragment_=/wiki/General%2520Parallel%2520File%2520System%2520%2528GPFS%2529/page/Display%2520per%2520node%2520IO%2520statstics can help you? BTW, I agree with Bob that 3.5 is out-of-service. Without an extended service, you should consider to upgrade your cluster as soon as possible. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 09/19/2017 03:28 AM Subject: Re: [gpfsug-discuss] How to find which node is generating high iops in a GPFS 3.5 Sent by: gpfsug-discuss-bounces at spectrumscale.org You do realize 3.5 is out of service, correct? You should be looking at upgrading :-) Catching this is real time, when you have a large number of nodes is going to be tough. How you recognizing that the file system is overloaded? Waiters? Looking at which nodes/NSDs have the longest/largest waiters may provide a clue. You might also take a look at mmpmon ? it?s a bit difficult to use in its raw state, but it does provide some good stats on a per file system basis. But you need to track these over times to get what you need. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Richard Lefebvre Reply-To: gpfsug main discussion list Date: Monday, September 18, 2017 at 2:18 PM To: gpfsug Subject: [EXTERNAL] [gpfsug-discuss] How to find which node is generating high iops in a GPFS 3.5 Hi I have a 3.5 GPFS system with 700+ nodes. I sometime have nodes that generate a lot of iops on the large file system but I cannot find the right tool to find which node is the source. I'm guessing under 4.2.X, there are now easy tools, but what can be done under GPFS 3.5. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=AYwUf61wv-Hq63KU7veQSxavdZy-e9eT9bkJFav8MVU&s=W42AQE74bvmOlw7P0D0wTqT0Rxop4KktnXeuDeGGdmk&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From rohwedder at de.ibm.com Tue Sep 19 08:42:46 2017 From: rohwedder at de.ibm.com (Markus Rohwedder) Date: Tue, 19 Sep 2017 09:42:46 +0200 Subject: [gpfsug-discuss] ZIMON Sensors config files... In-Reply-To: References: Message-ID: Hello Neil, While the description below provides a way on how to edit the hostname parameter, you should not have the need to edit the "hostname" parameter. Sensors use the hostname() call to get the hostname where the sensor is running and use this as key in the performance database, which is what you typically want to see. From the description you provide I assume you want to have a sensor running on every node that has the perfmon designation? There could be different issues: > In order to enable sensors on every node, you need to ensure there is no "restrict" clause in the sensor description, or the restrict clause has to be set correctly > There could be some other communication issue between sensors and collectors. Restart sensors and collectors and check the logfiles in /var/log/zimon/. You should be able to see which sensors start up and if they can connect. > Can you check if you have the perfmon designation set for the nodes where you expect data from (mmlscluster) Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development Phone: +49 7034 6430190 IBM Deutschland E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Martina K?deritz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "IBM Spectrum Scale" To: gpfsug main discussion list Cc: gpfsug-discuss-bounces at spectrumscale.org Date: 09/19/2017 08:48 AM Subject: Re: [gpfsug-discuss] ZIMON Sensors config files... Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Neil, Have you tried these steps? mmperfmon config show --config-file /tmp/a vi /tmp/a mmperfmon config update --collectors oc8757286465 --config-file /tmp/a mmperfmon config show Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. Inactive hide details for "Wilson, Neil" ---09/15/2017 06:48:26 PM---Hi, Does anyone know how to use "mmperfmon config update" "Wilson, Neil" ---09/15/2017 06:48:26 PM---Hi, Does anyone know how to use "mmperfmon config update" to get the "hostname =" field in the ZImon From: "Wilson, Neil" To: gpfsug main discussion list Date: 09/15/2017 06:48 PM Subject: [gpfsug-discuss] ZIMON Sensors config files... Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, Does anyone know how to use ?mmperfmon config update? to get the ?hostname =? field in the ZImonSensors.cfg file populated with the hostname of the node that it?s been installed on? By default the field is empty and for some reason on our cluster it doesn?t transmit any metrics unless we put the node hostname into that field. Is there some kind of wildcard that I can set? Thanks Neil Neil Wilson Senior IT Practitioner Storage Team IT Services Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom Tel: +44 (0)1392 885959 Email: neil.wilson at metoffice.gov.uk Website www.metoffice.gov.uk Our magazine Barometer is now available online at http://www.metoffice.gov.uk/barometer/ P Please consider the environment before printing this e-mail. Thank you. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=JJA1q39zaRyjClihY50646c-CyY4ZvrmpSjR1qs5rTc&s=GWOiCpEHiZ_TqlFj0AeKmjcccnez-X2rHMa5UtvGPTk&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=Ow2bpnoab1kboH2xuSUrbx65ALeoAAicG7csl1sV-Qc&s=qZ1XUXWfOayLSSuvcCyHQ2ZgY1mu0Zs3kmpgeVQUCYI&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1D696444.gif Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From mnaineni at in.ibm.com Tue Sep 19 12:50:50 2017 From: mnaineni at in.ibm.com (Malahal R Naineni) Date: Tue, 19 Sep 2017 11:50:50 +0000 Subject: [gpfsug-discuss] NFS re-export with nfs-ganesha proxy? (Greg.Lehmann@csiro.au) Message-ID: An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Tue Sep 19 22:02:03 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Tue, 19 Sep 2017 21:02:03 +0000 Subject: [gpfsug-discuss] CCR cluster down for the count? Message-ID: <26ABC473-387D-4D58-9059-518E455724A9@vanderbilt.edu> Hi All, We have a small test cluster that is CCR enabled. It only had/has 3 NSD servers (testnsd1, 2, and 3) and maybe 3-6 clients. testnsd3 died a while back. I did nothing about it at the time because it was due to be life-cycled as soon as I finished a couple of higher priority projects. Yesterday, testnsd1 also died, which took the whole cluster down. So now resolving this has become higher priority? ;-) I took two other boxes and set them up as testnsd1 and 3, respectively. I?ve done a ?mmsdrrestore -p testnsd2 -R /usr/bin/scp? on both of them. I?ve also done a "mmccr setup -F? and copied the ccr.disks and ccr.nodes files from testnsd2 to them. And I?ve copied /var/mmfs/gen/mmsdrfs from testnsd2 to testnsd1 and 3. In case it?s not obvious from the above, networking is fine ? ssh without a password between those 3 boxes is fine. However, when I try to startup GPFS ? or run any GPFS command I get: /root root at testnsd2# mmstartup -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous error messages to determine cause. /root root at testnsd2# I?ve got to run to a meeting right now, so I hope I?m not leaving out any crucial details here ? does anyone have an idea what I need to do? Thanks? ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Sep 20 00:39:37 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 19 Sep 2017 23:39:37 +0000 Subject: [gpfsug-discuss] CCR cluster down for the count? Message-ID: OK ? I?ve run across this before, and it?s because of a bug (as I recall) having to do with CCR and quorum. What I think you can do is set the cluster to non-ccr (mmchcluster ?ccr-disable) with all the nodes down, bring it back up and then re-enable ccr. I?ll see if I can find this in one of the recent 4.2 release nodes. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of "Buterbaugh, Kevin L" Reply-To: gpfsug main discussion list Date: Tuesday, September 19, 2017 at 4:03 PM To: gpfsug main discussion list Subject: [EXTERNAL] [gpfsug-discuss] CCR cluster down for the count? Hi All, We have a small test cluster that is CCR enabled. It only had/has 3 NSD servers (testnsd1, 2, and 3) and maybe 3-6 clients. testnsd3 died a while back. I did nothing about it at the time because it was due to be life-cycled as soon as I finished a couple of higher priority projects. Yesterday, testnsd1 also died, which took the whole cluster down. So now resolving this has become higher priority? ;-) I took two other boxes and set them up as testnsd1 and 3, respectively. I?ve done a ?mmsdrrestore -p testnsd2 -R /usr/bin/scp? on both of them. I?ve also done a "mmccr setup -F? and copied the ccr.disks and ccr.nodes files from testnsd2 to them. And I?ve copied /var/mmfs/gen/mmsdrfs from testnsd2 to testnsd1 and 3. In case it?s not obvious from the above, networking is fine ? ssh without a password between those 3 boxes is fine. However, when I try to startup GPFS ? or run any GPFS command I get: /root root at testnsd2# mmstartup -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous error messages to determine cause. /root root at testnsd2# I?ve got to run to a meeting right now, so I hope I?m not leaving out any crucial details here ? does anyone have an idea what I need to do? Thanks? ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bevans at pixitmedia.com Wed Sep 20 02:21:36 2017 From: bevans at pixitmedia.com (Barry Evans) Date: Tue, 19 Sep 2017 18:21:36 -0700 Subject: [gpfsug-discuss] RoCE not playing ball Message-ID: Hi All, Weirdness with a RoCE interface - verbs is not playing ball and is complaining about the inet6 address not matching up: 2017-09-02_07:46:01.376+0100: [I] VERBS RDMA starting with verbsRdmaCm=yes verbsRdmaSend=no verbsRdmaUseMultiCqThreads=yes verbsRdmaUseCompVectors=yes 2017-09-02_07:46:01.377+0100: [I] VERBS RDMA library librdmacm.so (version >= 1.1) loaded and initialized. 2017-09-02_07:46:01.377+0100: [I] VERBS RDMA verbsRdmasPerNode reduced from 1000 to 514 to match (nsdMaxWorkerThreads 512 + (nspdThreadsPerQueue 2 * nspdQueues 1)). 2017-09-02_07:46:01.382+0100: [I] VERBS RDMA discover mlx4_1 port 1 transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet 0xFE80000000000000 id 0x268A07FFFEF981C0 state ACTIVE 2017-09-02_07:46:01.383+0100: [I] VERBS RDMA discover mlx4_1 port 1 transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[1] subnet 0x0000000000000000 id 0x0000FFFFAC106404 state ACTIVE 2017-09-02_07:46:01.384+0100: [I] VERBS RDMA discover mlx4_1 port 2 transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet 0xFE80000000000000 id 0x248A070001F981E1 state DOWN 2017-09-02_07:46:01.385+0100: [I] VERBS RDMA discover mlx4_0 port 1 transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet 0xFE80000000000000 id 0x268A07FFFEF981C0 state ACTIVE 2017-09-02_07:46:01.385+0100: [I] VERBS RDMA discover mlx4_0 port 1 transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[1] subnet 0x0000000000000000 id 0x0000FFFFAC106404 state ACTIVE 2017-09-02_07:46:01.386+0100: [I] VERBS RDMA discover mlx4_0 port 2 transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet 0xFE80000000000000 id 0x248A070001F981C1 state ACTIVE 2017-09-02_07:46:01.386+0100: [I] VERBS RDMA discover mlx4_0 port 2 transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[1] subnet 0x0000000000000000 id 0x0000FFFF0AC20011 state ACTIVE 2017-09-02_07:46:01.387+0100: [I] VERBS RDMA parse verbsPorts mlx4_0/1 2017-09-02_07:46:01.390+0100: [W] VERBS RDMA parse error verbsPort mlx4_0/1 ignored due to interface not found for port 1 of device mlx4_0 with GID c081f9feff078a26. Please check if the correct inet6 address for the corresponding IP network interface is set 2017-09-02_07:46:01.390+0100: [E] VERBS RDMA: rdma_get_cm_event err -1 2017-09-02_07:46:01.391+0100: [I] VERBS RDMA library librdmacm.so unloaded. 2017-09-02_07:46:01.391+0100: [E] VERBS RDMA failed to start, no valid verbsPorts defined. Anyone run into this before? I have another node imaged the *exact* same way and no dice. Have tried a variety of drivers, cards, etc, same result every time. Cheers, Barry -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Wed Sep 20 04:07:18 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Wed, 20 Sep 2017 11:07:18 +0800 Subject: [gpfsug-discuss] CCR cluster down for the count? In-Reply-To: References: Message-ID: Hi Kevin, Let's me try to understand the problem you have. What's the meaning of node died here. Are you mean that there are some hardware/OS issue which cannot be fixed and OS cannot be up anymore? I agree with Bob that you can have a try to disable CCR temporally, restore cluster configuration and enable it again. Such as: 1. Login to a node which has proper GPFS config, e.g NodeA 2. Shutdown daemon in all client cluster. 3. mmchcluster --ccr-disable -p NodeA 4. mmsdrrestore -a -p NodeA 5. mmauth genkey propagate -N testnsd1, testnsd3 6. mmchcluster --ccr-enable Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 09/20/2017 07:39 AM Subject: Re: [gpfsug-discuss] CCR cluster down for the count? Sent by: gpfsug-discuss-bounces at spectrumscale.org OK ? I?ve run across this before, and it?s because of a bug (as I recall) having to do with CCR and quorum. What I think you can do is set the cluster to non-ccr (mmchcluster ?ccr-disable) with all the nodes down, bring it back up and then re-enable ccr. I?ll see if I can find this in one of the recent 4.2 release nodes. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of "Buterbaugh, Kevin L" Reply-To: gpfsug main discussion list Date: Tuesday, September 19, 2017 at 4:03 PM To: gpfsug main discussion list Subject: [EXTERNAL] [gpfsug-discuss] CCR cluster down for the count? Hi All, We have a small test cluster that is CCR enabled. It only had/has 3 NSD servers (testnsd1, 2, and 3) and maybe 3-6 clients. testnsd3 died a while back. I did nothing about it at the time because it was due to be life-cycled as soon as I finished a couple of higher priority projects. Yesterday, testnsd1 also died, which took the whole cluster down. So now resolving this has become higher priority? ;-) I took two other boxes and set them up as testnsd1 and 3, respectively. I?ve done a ?mmsdrrestore -p testnsd2 -R /usr/bin/scp? on both of them. I?ve also done a "mmccr setup -F? and copied the ccr.disks and ccr.nodes files from testnsd2 to them. And I?ve copied /var/mmfs/gen/mmsdrfs from testnsd2 to testnsd1 and 3. In case it?s not obvious from the above, networking is fine ? ssh without a password between those 3 boxes is fine. However, when I try to startup GPFS ? or run any GPFS command I get: /root root at testnsd2# mmstartup -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous error messages to determine cause. /root root at testnsd2# I?ve got to run to a meeting right now, so I hope I?m not leaving out any crucial details here ? does anyone have an idea what I need to do? Thanks? ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=mBSa534LB4C2zN59ZsJSlginQqfcrutinpAPYNDqU_Y&s=YJEapknqzE2d9kwZzZuu6gEW0DzBoM-o94pXGEeCfuI&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From scale at us.ibm.com Wed Sep 20 04:33:16 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Wed, 20 Sep 2017 11:33:16 +0800 Subject: [gpfsug-discuss] Disk change problem in gss GNR In-Reply-To: References: Message-ID: Hi Atmane, In terms of this kind of disk management question, I would like to suggest to open a PMR to make IBM service help you. mmdelpdisk command would not need to reboot system to take effect. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: atmane To: "gpfsug-discuss at spectrumscale.org" Date: 09/14/2017 08:50 PM Subject: [gpfsug-discuss] Disk change problem in gss GNR Sent by: gpfsug-discuss-bounces at spectrumscale.org dear all, I change A Disk In Gss Storage Server mmchcarrier BB1RGL --release --pdisk 'e1d1s02' mmchcarrier BB1RGL --replace --pdisk 'e1d1s02' after replace disk Now I Have 2 Discs In My Gss the first disc was well changed name = "e1d1s02" the second disk still after I use this cmd mmdelpdisk BB1RGL --pdisk e1d1s02#004 -a the disk is still in use i need to reboot the system or ?? mmlspdisk all | less pdisk: replacementPriority = 1000 name = "e1d1s02" device = "/dev/sdik,/dev/sdih" recoveryGroup = "BB1RGL" declusteredArray = "DA1" state = "ok" capacity = 3000034656256 freeSpace = 1453846429696 fru = "00W1572" location = "SV30820390-1-2" WWN = "naa.5000C5008D783E37" server = "gss0-ib0" pdisk: replacementPriority = 1000 name = "e1d1s02#004" device = "" recoveryGroup = "BB1RGL" declusteredArray = "DA1" state = "missing/noPath/systemDrain/adminDrain/noRGD/noVCD" capacity = 3000034656256 freeSpace = 1599875317760 fru = "00W1572" location = "" WWN = "naa.5000C50056714E83" server = "gss0-ib0" -- -- Atmane Khiredine HPC System Admin | Office National de la M?t?orologie T?l : +213 21 50 73 93 Poste 303 | Fax : +213 21 50 79 40 | E-mail : a.khiredine at meteo.dz _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFbA&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=hQ86ctTaI7i14NrB-58_SzqSWnCR8p6b5bFxtzNcSbk&s=mthjH7ebhnNlSJl71hFjF4wZU0iygm3I9wH_Bu7_3Ds&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From olaf.weiser at de.ibm.com Wed Sep 20 06:00:49 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Wed, 20 Sep 2017 07:00:49 +0200 Subject: [gpfsug-discuss] RoCE not playing ball In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From jonathon.anderson at colorado.edu Wed Sep 20 06:13:13 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Wed, 20 Sep 2017 05:13:13 +0000 Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication In-Reply-To: References: <27469.1500914134@turing-police.cc.vt.edu> , Message-ID: Returning to this thread cause I'm having the same issue as Ilan, above. I'm working on setting up CES in our environment after finally getting a blocking bugfix applied. I'm making it further now, but I'm getting an error when I try to create my export: --- [root at sgate2 ~]# mmnfs export add /gpfs/summit/scratch --client 'login*.rc.int.colorado.edu(rw,root_squash);dtn*.rc.int.colorado.edu(rw,root_squash)' mmcesfuncs.sh: Current authentication: none is invalid. This operation can not be completed without correct Authentication configuration. Configure authentication using: mmuserauth mmnfs export add: Command failed. Examine previous error messages to determine cause. --- When I try to configure mmuserauth, I get an error about not having SMB active; but I don't want to configure SMB, only NFS. --- [root at sgate2 ~]# /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. --- How can I configure NFS exports with mmnfs without having to enable SMB? ~jonathon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Varun Mittal3 Sent: Tuesday, July 25, 2017 9:44:24 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sorry a small typo: mmuserauth service create --data-access-method file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune [Inactive hide details for Varun Mittal3---26/07/2017 09:12:27 AM---Hi Did you try to run this command from a CES designated nod]Varun Mittal3---26/07/2017 09:12:27 AM---Hi Did you try to run this command from a CES designated node ? From: Varun Mittal3/India/IBM To: gpfsug main discussion list Date: 26/07/2017 09:12 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication ________________________________ Hi Did you try to run this command from a CES designated node ? If no, then try executing the command from a CES node: mmuserauth service create --data-access-type file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune [Inactive hide details for Ilan Schwarts ---25/07/2017 10:22:26 AM---Hi, While trying to add the userdefined auth, I receive err]Ilan Schwarts ---25/07/2017 10:22:26 AM---Hi, While trying to add the userdefined auth, I receive error that SMB From: Ilan Schwarts To: gpfsug main discussion list Date: 25/07/2017 10:22 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi, While trying to add the userdefined auth, I receive error that SMB service not enabled. I am currently working on a spectrum scale cluster, and i dont have the SMB package, I am waiting for it.. is there a way to export NFSv3 using the spectrum scale tools without SMB package ? [root at LH20-GPFS1 ~]# mmuserauth service create --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. I exported the NFS via /etc/exports and than ./exportfs -a .. It works fine, I was able to mount the gpfs export from another machine.. this was my work-around since the spectrum scale tools failed to export NFSv3 On Mon, Jul 24, 2017 at 7:35 PM, wrote: > On Mon, 24 Jul 2017 13:36:41 +0300, Ilan Schwarts said: >> Hi, >> I have gpfs with 2 Nodes (redhat). >> I am trying to create NFS share - So I would be able to mount and >> access it from another linux machine. > >> While trying to create NFS (I execute the following): >> [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "* >> Access_Type=RW,Protocols=3:4,Squash=no_root_squash)" > > You can get away with little to no authentication for NFSv3, but > not for NFSv4. Try with Protocols=3 only and > > mmuserauth service create --type userdefined > > that should get you Unix-y NFSv3 UID/GID support and "trust what the NFS > client tells you". This of course only works sanely if each NFS export is > only to a set of machines in the same administrative domain that manages their > UID/GIDs. Exporting to two sets of machines that don't coordinate their > UID/GID space is, of course, where hilarity and hijinks ensue.... > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- - Ilan Schwarts _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathon.anderson at colorado.edu Wed Sep 20 06:33:14 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Wed, 20 Sep 2017 05:33:14 +0000 Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication In-Reply-To: References: <27469.1500914134@turing-police.cc.vt.edu> , , Message-ID: I should have said, here are the package versions: [root at sgate1 ~]# rpm -qa | grep gpfs gpfs.gpl-4.2.2-3.noarch gpfs.docs-4.2.2-3.noarch gpfs.base-4.2.2-3.x86_64 gpfs.gplbin-3.10.0-514.26.2.el7.x86_64-4.2.2-3.x86_64 nfs-ganesha-gpfs-2.3.2-0.ibm32_2.el7.x86_64 gpfs.ext-4.2.2-3.x86_64 gpfs.msg.en_US-4.2.2-3.noarch gpfs.gskit-8.0.50-57.x86_64 gpfs.gplbin-3.10.0-327.36.3.el7.x86_64-4.2.2-3.x86_64 ________________________________________ From: Jonathon A Anderson Sent: Tuesday, September 19, 2017 11:13:13 PM To: gpfsug main discussion list Cc: varun.mittal at in.ibm.com; Mark.Bush at siriuscom.com Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Returning to this thread cause I'm having the same issue as Ilan, above. I'm working on setting up CES in our environment after finally getting a blocking bugfix applied. I'm making it further now, but I'm getting an error when I try to create my export: --- [root at sgate2 ~]# mmnfs export add /gpfs/summit/scratch --client 'login*.rc.int.colorado.edu(rw,root_squash);dtn*.rc.int.colorado.edu(rw,root_squash)' mmcesfuncs.sh: Current authentication: none is invalid. This operation can not be completed without correct Authentication configuration. Configure authentication using: mmuserauth mmnfs export add: Command failed. Examine previous error messages to determine cause. --- When I try to configure mmuserauth, I get an error about not having SMB active; but I don't want to configure SMB, only NFS. --- [root at sgate2 ~]# /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. --- How can I configure NFS exports with mmnfs without having to enable SMB? ~jonathon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Varun Mittal3 Sent: Tuesday, July 25, 2017 9:44:24 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sorry a small typo: mmuserauth service create --data-access-method file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune [Inactive hide details for Varun Mittal3---26/07/2017 09:12:27 AM---Hi Did you try to run this command from a CES designated nod]Varun Mittal3---26/07/2017 09:12:27 AM---Hi Did you try to run this command from a CES designated node ? From: Varun Mittal3/India/IBM To: gpfsug main discussion list Date: 26/07/2017 09:12 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication ________________________________ Hi Did you try to run this command from a CES designated node ? If no, then try executing the command from a CES node: mmuserauth service create --data-access-type file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune [Inactive hide details for Ilan Schwarts ---25/07/2017 10:22:26 AM---Hi, While trying to add the userdefined auth, I receive err]Ilan Schwarts ---25/07/2017 10:22:26 AM---Hi, While trying to add the userdefined auth, I receive error that SMB From: Ilan Schwarts To: gpfsug main discussion list Date: 25/07/2017 10:22 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi, While trying to add the userdefined auth, I receive error that SMB service not enabled. I am currently working on a spectrum scale cluster, and i dont have the SMB package, I am waiting for it.. is there a way to export NFSv3 using the spectrum scale tools without SMB package ? [root at LH20-GPFS1 ~]# mmuserauth service create --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. I exported the NFS via /etc/exports and than ./exportfs -a .. It works fine, I was able to mount the gpfs export from another machine.. this was my work-around since the spectrum scale tools failed to export NFSv3 On Mon, Jul 24, 2017 at 7:35 PM, wrote: > On Mon, 24 Jul 2017 13:36:41 +0300, Ilan Schwarts said: >> Hi, >> I have gpfs with 2 Nodes (redhat). >> I am trying to create NFS share - So I would be able to mount and >> access it from another linux machine. > >> While trying to create NFS (I execute the following): >> [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "* >> Access_Type=RW,Protocols=3:4,Squash=no_root_squash)" > > You can get away with little to no authentication for NFSv3, but > not for NFSv4. Try with Protocols=3 only and > > mmuserauth service create --type userdefined > > that should get you Unix-y NFSv3 UID/GID support and "trust what the NFS > client tells you". This of course only works sanely if each NFS export is > only to a set of machines in the same administrative domain that manages their > UID/GIDs. Exporting to two sets of machines that don't coordinate their > UID/GID space is, of course, where hilarity and hijinks ensue.... > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- - Ilan Schwarts _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From gangqiu at cn.ibm.com Wed Sep 20 06:58:15 2017 From: gangqiu at cn.ibm.com (Gang Qiu) Date: Wed, 20 Sep 2017 13:58:15 +0800 Subject: [gpfsug-discuss] RoCE not playing ball In-Reply-To: References: Message-ID: Do you set ip address for these adapters? Refer to the description of verbsRdmaCm in ?Command and Programming Reference': If RDMA CM is enabled for a node, the node will only be able to establish RDMA connections using RDMA CM to other nodes with verbsRdmaCm enabled. RDMA CM enablement requires IPoIB (IP over InfiniBand) with an active IP address for each port. Although IPv6 must be enabled, the GPFS implementation of RDMA CM does not currently support IPv6 addresses, so an IPv4 address must be used. Regards, Gang Qiu ********************************************************************************************** IBM China Systems & Technology Lab Tel: 86-10-82452193 Fax: 86-10-82452312 Moble: 132-6134-8284 Email: gangqiu at cn.ibm.com Address: Ring Bldg. No.28 Building, Zhong Guan Cun Software Park, No. 8 Dong Bei Wang West Road, ShangDi, Haidian District, Beijing 100193, P.R.China ??????????????8???????28???????????100193 ********************************************************************************************** From: "Olaf Weiser" To: gpfsug main discussion list Date: 09/20/2017 01:01 PM Subject: Re: [gpfsug-discuss] RoCE not playing ball Sent by: gpfsug-discuss-bounces at spectrumscale.org is ib_read_bw working ? just test it between the two nodes ... From: Barry Evans To: gpfsug main discussion list Date: 09/20/2017 03:21 AM Subject: [gpfsug-discuss] RoCE not playing ball Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, Weirdness with a RoCE interface - verbs is not playing ball and is complaining about the inet6 address not matching up: 2017-09-02_07:46:01.376+0100: [I] VERBS RDMA starting with verbsRdmaCm=yes verbsRdmaSend=no verbsRdmaUseMultiCqThreads=yes verbsRdmaUseCompVectors=yes 2017-09-02_07:46:01.377+0100: [I] VERBS RDMA library librdmacm.so (version >= 1.1) loaded and initialized. 2017-09-02_07:46:01.377+0100: [I] VERBS RDMA verbsRdmasPerNode reduced from 1000 to 514 to match (nsdMaxWorkerThreads 512 + (nspdThreadsPerQueue 2 * nspdQueues 1)). 2017-09-02_07:46:01.382+0100: [I] VERBS RDMA discover mlx4_1 port 1 transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet 0xFE80000000000000 id 0x268A07FFFEF981C0 state ACTIVE 2017-09-02_07:46:01.383+0100: [I] VERBS RDMA discover mlx4_1 port 1 transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[1] subnet 0x0000000000000000 id 0x0000FFFFAC106404 state ACTIVE 2017-09-02_07:46:01.384+0100: [I] VERBS RDMA discover mlx4_1 port 2 transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet 0xFE80000000000000 id 0x248A070001F981E1 state DOWN 2017-09-02_07:46:01.385+0100: [I] VERBS RDMA discover mlx4_0 port 1 transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet 0xFE80000000000000 id 0x268A07FFFEF981C0 state ACTIVE 2017-09-02_07:46:01.385+0100: [I] VERBS RDMA discover mlx4_0 port 1 transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[1] subnet 0x0000000000000000 id 0x0000FFFFAC106404 state ACTIVE 2017-09-02_07:46:01.386+0100: [I] VERBS RDMA discover mlx4_0 port 2 transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet 0xFE80000000000000 id 0x248A070001F981C1 state ACTIVE 2017-09-02_07:46:01.386+0100: [I] VERBS RDMA discover mlx4_0 port 2 transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[1] subnet 0x0000000000000000 id 0x0000FFFF0AC20011 state ACTIVE 2017-09-02_07:46:01.387+0100: [I] VERBS RDMA parse verbsPorts mlx4_0/1 2017-09-02_07:46:01.390+0100: [W] VERBS RDMA parse error verbsPort mlx4_0/1 ignored due to interface not found for port 1 of device mlx4_0 with GID c081f9feff078a26. Please check if the correct inet6 address for the corresponding IP network interface is set 2017-09-02_07:46:01.390+0100: [E] VERBS RDMA: rdma_get_cm_event err -1 2017-09-02_07:46:01.391+0100: [I] VERBS RDMA library librdmacm.so unloaded. 2017-09-02_07:46:01.391+0100: [E] VERBS RDMA failed to start, no valid verbsPorts defined. Anyone run into this before? I have another node imaged the *exact* same way and no dice. Have tried a variety of drivers, cards, etc, same result every time. Cheers, Barry This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email._______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=NCthMXTjizwdEVDBqoDwAfRswiFbdQVHRb4mzseFLEM&m=u155tVFn5u91gqIsTXSOSVvpbR7GQRPoVpviUDH73R0&s=63nY5ozD8mej1jefNBZjLGCkNOFD9-swr-lc7CRPbrM&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From tortay at cc.in2p3.fr Wed Sep 20 09:03:54 2017 From: tortay at cc.in2p3.fr (Loic Tortay) Date: Wed, 20 Sep 2017 10:03:54 +0200 Subject: [gpfsug-discuss] CCR cluster down for the count? In-Reply-To: <26ABC473-387D-4D58-9059-518E455724A9@vanderbilt.edu> References: <26ABC473-387D-4D58-9059-518E455724A9@vanderbilt.edu> Message-ID: <853ffcf7-7900-457b-0d8a-2c63886ed245@cc.in2p3.fr> On 19/09/2017 23:02, Buterbaugh, Kevin L wrote: > Hi All, > > We have a small test cluster that is CCR enabled. It only had/has 3 NSD servers (testnsd1, 2, and 3) and maybe 3-6 clients. testnsd3 died a while back. I did nothing about it at the time because it was due to be life-cycled as soon as I finished a couple of higher priority projects. > > Yesterday, testnsd1 also died, which took the whole cluster down. So now resolving this has become higher priority? ;-) > > I took two other boxes and set them up as testnsd1 and 3, respectively. I?ve done a ?mmsdrrestore -p testnsd2 -R /usr/bin/scp? on both of them. I?ve also done a "mmccr setup -F? and copied the ccr.disks and ccr.nodes files from testnsd2 to them. And I?ve copied /var/mmfs/gen/mmsdrfs from testnsd2 to testnsd1 and 3. In case it?s not obvious from the above, networking is fine ? ssh without a password between those 3 boxes is fine. > > However, when I try to startup GPFS ? or run any GPFS command I get: > > /root > root at testnsd2# mmstartup -a > get file failed: Not enough CCR quorum nodes available (err 809) > gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 > mmstartup: Command failed. Examine previous error messages to determine cause. > /root > root at testnsd2# > > I?ve got to run to a meeting right now, so I hope I?m not leaving out any crucial details here ? does anyone have an idea what I need to do? Thanks? > Hello, I have had the same issue multiple times. The "trick" is to execute "/usr/lpp/mmfs/bin/mmcommon startCcrMonitor" on a majority of quorum nodes (once they have the correct configuration files) to be able to start the cluster. I noticed a call to the above command in the "gpfs.gplbin" spec file in the "%postun" section (when doing RPM upgrades, if I'm not mistaken). . Lo?c. -- | Lo?c Tortay - IN2P3 Computing Centre | From r.sobey at imperial.ac.uk Wed Sep 20 09:23:37 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 20 Sep 2017 08:23:37 +0000 Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication In-Reply-To: References: <27469.1500914134@turing-police.cc.vt.edu> , Message-ID: This sounded familiar to a problem I had to do with SMB and NFS. I've looked, and it's a different problem, but at the time I had this response. "That would be the case when Active Directory is configured for authentication. In that case the SMB service includes two aspects: One is the actual SMB file server, and the second one is the service for the Active Directory integration. Since NFS depends on authentication and id mapping services, it requires SMB to be running." I suspect the last paragraph is relevant in your case. HTH Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jonathon A Anderson Sent: 20 September 2017 06:13 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Returning to this thread cause I'm having the same issue as Ilan, above. I'm working on setting up CES in our environment after finally getting a blocking bugfix applied. I'm making it further now, but I'm getting an error when I try to create my export: --- [root at sgate2 ~]# mmnfs export add /gpfs/summit/scratch --client 'login*.rc.int.colorado.edu(rw,root_squash);dtn*.rc.int.colorado.edu(rw,root_squash)' mmcesfuncs.sh: Current authentication: none is invalid. This operation can not be completed without correct Authentication configuration. Configure authentication using: mmuserauth mmnfs export add: Command failed. Examine previous error messages to determine cause. --- When I try to configure mmuserauth, I get an error about not having SMB active; but I don't want to configure SMB, only NFS. --- [root at sgate2 ~]# /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. --- How can I configure NFS exports with mmnfs without having to enable SMB? ~jonathon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Varun Mittal3 Sent: Tuesday, July 25, 2017 9:44:24 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sorry a small typo: mmuserauth service create --data-access-method file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune [Inactive hide details for Varun Mittal3---26/07/2017 09:12:27 AM---Hi Did you try to run this command from a CES designated nod]Varun Mittal3---26/07/2017 09:12:27 AM---Hi Did you try to run this command from a CES designated node ? From: Varun Mittal3/India/IBM To: gpfsug main discussion list Date: 26/07/2017 09:12 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication ________________________________ Hi Did you try to run this command from a CES designated node ? If no, then try executing the command from a CES node: mmuserauth service create --data-access-type file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune [Inactive hide details for Ilan Schwarts ---25/07/2017 10:22:26 AM---Hi, While trying to add the userdefined auth, I receive err]Ilan Schwarts ---25/07/2017 10:22:26 AM---Hi, While trying to add the userdefined auth, I receive error that SMB From: Ilan Schwarts To: gpfsug main discussion list Date: 25/07/2017 10:22 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi, While trying to add the userdefined auth, I receive error that SMB service not enabled. I am currently working on a spectrum scale cluster, and i dont have the SMB package, I am waiting for it.. is there a way to export NFSv3 using the spectrum scale tools without SMB package ? [root at LH20-GPFS1 ~]# mmuserauth service create --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. I exported the NFS via /etc/exports and than ./exportfs -a .. It works fine, I was able to mount the gpfs export from another machine.. this was my work-around since the spectrum scale tools failed to export NFSv3 On Mon, Jul 24, 2017 at 7:35 PM, wrote: > On Mon, 24 Jul 2017 13:36:41 +0300, Ilan Schwarts said: >> Hi, >> I have gpfs with 2 Nodes (redhat). >> I am trying to create NFS share - So I would be able to mount and >> access it from another linux machine. > >> While trying to create NFS (I execute the following): >> [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "* >> Access_Type=RW,Protocols=3:4,Squash=no_root_squash)" > > You can get away with little to no authentication for NFSv3, but > not for NFSv4. Try with Protocols=3 only and > > mmuserauth service create --type userdefined > > that should get you Unix-y NFSv3 UID/GID support and "trust what the NFS > client tells you". This of course only works sanely if each NFS export is > only to a set of machines in the same administrative domain that manages their > UID/GIDs. Exporting to two sets of machines that don't coordinate their > UID/GID space is, of course, where hilarity and hijinks ensue.... > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- - Ilan Schwarts _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From douglasof at us.ibm.com Wed Sep 20 09:28:44 2017 From: douglasof at us.ibm.com (Douglas O'flaherty) Date: Wed, 20 Sep 2017 08:28:44 +0000 Subject: [gpfsug-discuss] User Meeting & SPXXL in NYC Message-ID: Reminder that the SPXXL day on IBM Spectrum Scale in New York is open to all. It is Thursday the 28th. There is also a Power day on Wednesday. For more information http://hpcxxl.org/summer-2017-meeting-september-24-29-new-york-city/ Doug Mobile -------------- next part -------------- An HTML attachment was scrubbed... URL: From ckrafft at de.ibm.com Wed Sep 20 11:47:35 2017 From: ckrafft at de.ibm.com (Christoph Krafft) Date: Wed, 20 Sep 2017 12:47:35 +0200 Subject: [gpfsug-discuss] WANTED: Official support statement using Spectrum Scale 4.2.x with Oracle DB v12 Message-ID: Hi folks, is anyone aware if there is now an official support statement for Spectrum Scale 4.2.x? As far as my understanding goes - we currently have an "older" official support statement for v4.1 with Oracle. Many thanks up-front for any useful hints ... :) Mit freundlichen Gr??en / Sincerely Christoph Krafft Client Technical Specialist - Power Systems, IBM Systems Certified IT Specialist @ The Open Group Phone: +49 (0) 7034 643 2171 IBM Deutschland GmbH Mobile: +49 (0) 160 97 81 86 12 Am Weiher 24 Email: ckrafft at de.ibm.com 65451 Kelsterbach Germany IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Norbert Janzen, Stefan Lutz, Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 15225079.gif Type: image/gif Size: 1851 bytes Desc: not available URL: From Kevin.Buterbaugh at Vanderbilt.Edu Wed Sep 20 14:55:28 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 20 Sep 2017 13:55:28 +0000 Subject: [gpfsug-discuss] CCR cluster down for the count? In-Reply-To: References: Message-ID: Hi All, testnsd1 and testnsd3 both had hardware issues (power supply and internal HD respectively). Given that they were 12 year old boxes, we decided to replace them with other boxes that are a mere 7 years old ? keep in mind that this is a test cluster. Disabling CCR does not work, even with the undocumented ??force? option: /var/mmfs/gen root at testnsd2# mmchcluster --ccr-disable -p testnsd2 -s testnsd1 --force mmchcluster: Unable to obtain the GPFS configuration file lock. mmchcluster: GPFS was unable to obtain a lock from node testnsd1.vampire. mmchcluster: Processing continues without lock protection. The authenticity of host 'testnsd3.vampire (10.0.6.215)' can't be established. ECDSA key fingerprint is SHA256:Ky1pkjsC/kvt4RA8PJuEh/W3vcxCJZplr2m1XHr+UwI. ECDSA key fingerprint is MD5:55:59:a0:2a:6e:a1:00:58:85:3d:ac:86:0e:cd:2a:8a. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'testnsd1.vampire (10.0.6.213)' can't be established. ECDSA key fingerprint is SHA256:WPiTtyuyzhuv+lRRpgDjLuHpyHyk/W3+c5N9SabWvnE. ECDSA key fingerprint is MD5:26:26:2a:bf:e4:cb:1d:a8:27:35:96:ef:b5:96:e0:29. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp609.vampire (10.0.21.9)' can't be established. ECDSA key fingerprint is SHA256:/gX6eSp/shsRboVFcUFcNCtGSfbBIWQZ/CWjA6gb17Q. ECDSA key fingerprint is MD5:ca:4d:58:8c:91:28:25:7b:5b:b1:0d:a3:72:a3:00:bb. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp608.vampire (10.0.21.8)' can't be established. ECDSA key fingerprint is SHA256:tvtNWN9b7/Qknb/Am8x7FzyMngi6R3f5SHBqATNtLzw. ECDSA key fingerprint is MD5:fc:4e:87:fb:09:82:cd:67:b0:7d:7f:c7:4b:83:b9:6c. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp612.vampire (10.0.21.12)' can't be established. ECDSA key fingerprint is SHA256:zKXqPt8rIMZWSAYavKEuaAVIm31OGVovoWVU+dBTRPM. ECDSA key fingerprint is MD5:72:4d:fb:22:4e:b3:0e:04:37:be:16:74:ae:ea:05:6c. Are you sure you want to continue connecting (yes/no)? root at vmp610.vampire's password: testnsd3.vampire: Host key verification failed. mmdsh: testnsd3.vampire remote shell process had return code 255. testnsd1.vampire: Host key verification failed. mmdsh: testnsd1.vampire remote shell process had return code 255. vmp609.vampire: Host key verification failed. mmdsh: vmp609.vampire remote shell process had return code 255. vmp608.vampire: Host key verification failed. mmdsh: vmp608.vampire remote shell process had return code 255. vmp612.vampire: Host key verification failed. mmdsh: vmp612.vampire remote shell process had return code 255. root at vmp610.vampire's password: vmp610.vampire: Permission denied, please try again. root at vmp610.vampire's password: vmp610.vampire: Permission denied, please try again. vmp610.vampire: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password). mmdsh: vmp610.vampire remote shell process had return code 255. Verifying GPFS is stopped on all nodes ... The authenticity of host 'testnsd3.vampire (10.0.6.215)' can't be established. ECDSA key fingerprint is SHA256:Ky1pkjsC/kvt4RA8PJuEh/W3vcxCJZplr2m1XHr+UwI. ECDSA key fingerprint is MD5:55:59:a0:2a:6e:a1:00:58:85:3d:ac:86:0e:cd:2a:8a. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp612.vampire (10.0.21.12)' can't be established. ECDSA key fingerprint is SHA256:zKXqPt8rIMZWSAYavKEuaAVIm31OGVovoWVU+dBTRPM. ECDSA key fingerprint is MD5:72:4d:fb:22:4e:b3:0e:04:37:be:16:74:ae:ea:05:6c. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp608.vampire (10.0.21.8)' can't be established. ECDSA key fingerprint is SHA256:tvtNWN9b7/Qknb/Am8x7FzyMngi6R3f5SHBqATNtLzw. ECDSA key fingerprint is MD5:fc:4e:87:fb:09:82:cd:67:b0:7d:7f:c7:4b:83:b9:6c. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp609.vampire (10.0.21.9)' can't be established. ECDSA key fingerprint is SHA256:/gX6eSp/shsRboVFcUFcNCtGSfbBIWQZ/CWjA6gb17Q. ECDSA key fingerprint is MD5:ca:4d:58:8c:91:28:25:7b:5b:b1:0d:a3:72:a3:00:bb. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'testnsd1.vampire (10.0.6.213)' can't be established. ECDSA key fingerprint is SHA256:WPiTtyuyzhuv+lRRpgDjLuHpyHyk/W3+c5N9SabWvnE. ECDSA key fingerprint is MD5:26:26:2a:bf:e4:cb:1d:a8:27:35:96:ef:b5:96:e0:29. Are you sure you want to continue connecting (yes/no)? root at vmp610.vampire's password: root at vmp610.vampire's password: root at vmp610.vampire's password: testnsd3.vampire: Host key verification failed. mmdsh: testnsd3.vampire remote shell process had return code 255. vmp612.vampire: Host key verification failed. mmdsh: vmp612.vampire remote shell process had return code 255. vmp608.vampire: Host key verification failed. mmdsh: vmp608.vampire remote shell process had return code 255. vmp609.vampire: Host key verification failed. mmdsh: vmp609.vampire remote shell process had return code 255. testnsd1.vampire: Host key verification failed. mmdsh: testnsd1.vampire remote shell process had return code 255. vmp610.vampire: Permission denied, please try again. vmp610.vampire: Permission denied, please try again. vmp610.vampire: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password). mmdsh: vmp610.vampire remote shell process had return code 255. mmchcluster: Command failed. Examine previous error messages to determine cause. /var/mmfs/gen root at testnsd2# I believe that part of the problem may be that there are 4 client nodes that were removed from the cluster without removing them from the cluster (done by another SysAdmin who was in a hurry to repurpose those machines). They?re up and pingable but not reachable by GPFS anymore, which I?m pretty sure is making things worse. Nor does Loic?s suggestion of running mmcommon work (but thanks for the suggestion!) ? actually the mmcommon part worked, but a subsequent attempt to start the cluster up failed: /var/mmfs/gen root at testnsd2# mmstartup -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous error messages to determine cause. /var/mmfs/gen root at testnsd2# Thanks. Kevin On Sep 19, 2017, at 10:07 PM, IBM Spectrum Scale > wrote: Hi Kevin, Let's me try to understand the problem you have. What's the meaning of node died here. Are you mean that there are some hardware/OS issue which cannot be fixed and OS cannot be up anymore? I agree with Bob that you can have a try to disable CCR temporally, restore cluster configuration and enable it again. Such as: 1. Login to a node which has proper GPFS config, e.g NodeA 2. Shutdown daemon in all client cluster. 3. mmchcluster --ccr-disable -p NodeA 4. mmsdrrestore -a -p NodeA 5. mmauth genkey propagate -N testnsd1, testnsd3 6. mmchcluster --ccr-enable Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. "Oesterlin, Robert" ---09/20/2017 07:39:55 AM---OK ? I?ve run across this before, and it?s because of a bug (as I recall) having to do with CCR and From: "Oesterlin, Robert" > To: gpfsug main discussion list > Date: 09/20/2017 07:39 AM Subject: Re: [gpfsug-discuss] CCR cluster down for the count? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ OK ? I?ve run across this before, and it?s because of a bug (as I recall) having to do with CCR and quorum. What I think you can do is set the cluster to non-ccr (mmchcluster ?ccr-disable) with all the nodes down, bring it back up and then re-enable ccr. I?ll see if I can find this in one of the recent 4.2 release nodes. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: > on behalf of "Buterbaugh, Kevin L" > Reply-To: gpfsug main discussion list > Date: Tuesday, September 19, 2017 at 4:03 PM To: gpfsug main discussion list > Subject: [EXTERNAL] [gpfsug-discuss] CCR cluster down for the count? Hi All, We have a small test cluster that is CCR enabled. It only had/has 3 NSD servers (testnsd1, 2, and 3) and maybe 3-6 clients. testnsd3 died a while back. I did nothing about it at the time because it was due to be life-cycled as soon as I finished a couple of higher priority projects. Yesterday, testnsd1 also died, which took the whole cluster down. So now resolving this has become higher priority? ;-) I took two other boxes and set them up as testnsd1 and 3, respectively. I?ve done a ?mmsdrrestore -p testnsd2 -R /usr/bin/scp? on both of them. I?ve also done a "mmccr setup -F? and copied the ccr.disks and ccr.nodes files from testnsd2 to them. And I?ve copied /var/mmfs/gen/mmsdrfs from testnsd2 to testnsd1 and 3. In case it?s not obvious from the above, networking is fine ? ssh without a password between those 3 boxes is fine. However, when I try to startup GPFS ? or run any GPFS command I get: /root root at testnsd2# mmstartup -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous error messages to determine cause. /root root at testnsd2# I?ve got to run to a meeting right now, so I hope I?m not leaving out any crucial details here ? does anyone have an idea what I need to do? Thanks? ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=mBSa534LB4C2zN59ZsJSlginQqfcrutinpAPYNDqU_Y&s=YJEapknqzE2d9kwZzZuu6gEW0DzBoM-o94pXGEeCfuI&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C494f0469ec084568b39608d4ffd4b8c2%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636414736486816768&sdata=kBvEL7Kp2JMGuLIL4NX3UV7h3emaayQSbHr8O1F2CXc%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bevans at pixitmedia.com Wed Sep 20 15:17:34 2017 From: bevans at pixitmedia.com (Barry Evans) Date: Wed, 20 Sep 2017 07:17:34 -0700 Subject: [gpfsug-discuss] RoCE not playing ball In-Reply-To: References: Message-ID: Yep, IP's set ok. We did try with ipv6 off to see what would happen, then turned it back on again. There are ipv6 addresses on the cards, but ipv4 is the only thing actually being used. On Tue, Sep 19, 2017 at 10:58 PM, Gang Qiu wrote: > > > > Do you set ip address for these adapters? > > Refer to the description of verbsRdmaCm in ?Command and Programming > Reference': > > If RDMA CM is enabled for a node, the node will only be able to establish > RDMA connections > using RDMA CM to other nodes with *verbsRdmaCm *enabled. RDMA CM > enablement requires > IPoIB (IP over InfiniBand) with an active IP address for each port. > Although IPv6 must be > enabled, the GPFS implementation of RDMA CM does not currently support > IPv6 addresses, so > an IPv4 address must be used. > > > > Regards, > Gang Qiu > > ************************************************************ > ********************************** > IBM China Systems & Technology Lab > Tel: 86-10-82452193 > Fax: 86-10-82452312 > Moble: 132-6134-8284 > Email: gangqiu at cn.ibm.com > Address: Ring Bldg. No.28 Building, Zhong Guan Cun Software Park, No. 8 > Dong Bei Wang West Road, ShangDi, Haidian District, Beijing 100193, > P.R.China > ??????????????8???????28???????????100193 > ************************************************************ > ********************************** > > > > From: "Olaf Weiser" > To: gpfsug main discussion list > Date: 09/20/2017 01:01 PM > Subject: Re: [gpfsug-discuss] RoCE not playing ball > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > is ib_read_bw working ? > just test it between the two nodes ... > > > > > From: Barry Evans > To: gpfsug main discussion list > Date: 09/20/2017 03:21 AM > Subject: [gpfsug-discuss] RoCE not playing ball > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi All, > > Weirdness with a RoCE interface - verbs is not playing ball and is > complaining about the inet6 address not matching up: > > 2017-09-02_07:46:01.376+0100: [I] VERBS RDMA starting with verbsRdmaCm=yes > verbsRdmaSend=no verbsRdmaUseMultiCqThreads=yes verbsRdmaUseCompVectors=yes > 2017-09-02_07:46:01.377+0100: [I] VERBS RDMA library librdmacm.so (version > >= 1.1) loaded and initialized. > 2017-09-02_07:46:01.377+0100: [I] VERBS RDMA verbsRdmasPerNode reduced > from 1000 to 514 to match (nsdMaxWorkerThreads 512 + (nspdThreadsPerQueue 2 > * nspdQueues 1)). > 2017-09-02_07:46:01.382+0100: [I] VERBS RDMA discover mlx4_1 port 1 > transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet > 0xFE80000000000000 id 0x268A07FFFEF981C0 state ACTIVE > 2017-09-02_07:46:01.383+0100: [I] VERBS RDMA discover mlx4_1 port 1 > transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[1] subnet > 0x0000000000000000 id 0x0000FFFFAC106404 state ACTIVE > 2017-09-02_07:46:01.384+0100: [I] VERBS RDMA discover mlx4_1 port 2 > transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet > 0xFE80000000000000 id 0x248A070001F981E1 state DOWN > 2017-09-02_07:46:01.385+0100: [I] VERBS RDMA discover mlx4_0 port 1 > transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet > 0xFE80000000000000 id 0x268A07FFFEF981C0 state ACTIVE > 2017-09-02_07:46:01.385+0100: [I] VERBS RDMA discover mlx4_0 port 1 > transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[1] subnet > 0x0000000000000000 id 0x0000FFFFAC106404 state ACTIVE > 2017-09-02_07:46:01.386+0100: [I] VERBS RDMA discover mlx4_0 port 2 > transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet > 0xFE80000000000000 id 0x248A070001F981C1 state ACTIVE > 2017-09-02_07:46:01.386+0100: [I] VERBS RDMA discover mlx4_0 port 2 > transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[1] subnet > 0x0000000000000000 id 0x0000FFFF0AC20011 state ACTIVE > 2017-09-02_07:46:01.387+0100: [I] VERBS RDMA parse verbsPorts mlx4_0/1 > 2017-09-02_07:46:01.390+0100: [W] VERBS RDMA parse error verbsPort > mlx4_0/1 ignored due to interface not found for port 1 of device mlx4_0 > with GID c081f9feff078a26. Please check if the correct inet6 address for > the corresponding IP network interface is set > 2017-09-02_07:46:01.390+0100: [E] VERBS RDMA: rdma_get_cm_event err -1 > 2017-09-02_07:46:01.391+0100: [I] VERBS RDMA library librdmacm.so unloaded. > 2017-09-02_07:46:01.391+0100: [E] VERBS RDMA failed to start, no valid > verbsPorts defined. > > > Anyone run into this before? I have another node imaged the *exact* same > way and no dice. Have tried a variety of drivers, cards, etc, same result > every time. > > Cheers, > Barry > > > > > > > > This email is confidential in that it is intended for the exclusive > attention of the addressee(s) indicated. If you are not the intended > recipient, this email should not be read or disclosed to any other person. > Please notify the sender immediately and delete this email from your > computer system. Any opinions expressed are not necessarily those of the > company from which this email was sent and, whilst to the best of our > knowledge no viruses or defects exist, no responsibility can be accepted > for any loss or damage arising from its receipt or subsequent use of this > email._______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug. > org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r= > NCthMXTjizwdEVDBqoDwAfRswiFbdQVHRb4mzseFLEM&m= > u155tVFn5u91gqIsTXSOSVvpbR7GQRPoVpviUDH73R0&s= > 63nY5ozD8mej1jefNBZjLGCkNOFD9-swr-lc7CRPbrM&e= > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bevans at pixitmedia.com Wed Sep 20 15:23:21 2017 From: bevans at pixitmedia.com (Barry Evans) Date: Wed, 20 Sep 2017 07:23:21 -0700 Subject: [gpfsug-discuss] RoCE not playing ball In-Reply-To: References: Message-ID: It has worked, yes, and while the issue has been present. At the moment it's not working, but I'm not entirely surprised with the amount it's been poked at. Cheers, Barry On Tue, Sep 19, 2017 at 10:00 PM, Olaf Weiser wrote: > is ib_read_bw working ? > just test it between the two nodes ... > > > > > From: Barry Evans > To: gpfsug main discussion list > Date: 09/20/2017 03:21 AM > Subject: [gpfsug-discuss] RoCE not playing ball > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi All, > > Weirdness with a RoCE interface - verbs is not playing ball and is > complaining about the inet6 address not matching up: > > 2017-09-02_07:46:01.376+0100: [I] VERBS RDMA starting with verbsRdmaCm=yes > verbsRdmaSend=no verbsRdmaUseMultiCqThreads=yes verbsRdmaUseCompVectors=yes > 2017-09-02_07:46:01.377+0100: [I] VERBS RDMA library librdmacm.so (version > >= 1.1) loaded and initialized. > 2017-09-02_07:46:01.377+0100: [I] VERBS RDMA verbsRdmasPerNode reduced > from 1000 to 514 to match (nsdMaxWorkerThreads 512 + (nspdThreadsPerQueue 2 > * nspdQueues 1)). > 2017-09-02_07:46:01.382+0100: [I] VERBS RDMA discover mlx4_1 port 1 > transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet > 0xFE80000000000000 id 0x268A07FFFEF981C0 state ACTIVE > 2017-09-02_07:46:01.383+0100: [I] VERBS RDMA discover mlx4_1 port 1 > transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[1] subnet > 0x0000000000000000 id 0x0000FFFFAC106404 state ACTIVE > 2017-09-02_07:46:01.384+0100: [I] VERBS RDMA discover mlx4_1 port 2 > transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet > 0xFE80000000000000 id 0x248A070001F981E1 state DOWN > 2017-09-02_07:46:01.385+0100: [I] VERBS RDMA discover mlx4_0 port 1 > transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet > 0xFE80000000000000 id 0x268A07FFFEF981C0 state ACTIVE > 2017-09-02_07:46:01.385+0100: [I] VERBS RDMA discover mlx4_0 port 1 > transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[1] subnet > 0x0000000000000000 id 0x0000FFFFAC106404 state ACTIVE > 2017-09-02_07:46:01.386+0100: [I] VERBS RDMA discover mlx4_0 port 2 > transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet > 0xFE80000000000000 id 0x248A070001F981C1 state ACTIVE > 2017-09-02_07:46:01.386+0100: [I] VERBS RDMA discover mlx4_0 port 2 > transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[1] subnet > 0x0000000000000000 id 0x0000FFFF0AC20011 state ACTIVE > 2017-09-02_07:46:01.387+0100: [I] VERBS RDMA parse verbsPorts mlx4_0/1 > 2017-09-02_07:46:01.390+0100: [W] VERBS RDMA parse error verbsPort > mlx4_0/1 ignored due to interface not found for port 1 of device mlx4_0 > with GID c081f9feff078a26. Please check if the correct inet6 address for > the corresponding IP network interface is set > 2017-09-02_07:46:01.390+0100: [E] VERBS RDMA: rdma_get_cm_event err -1 > 2017-09-02_07:46:01.391+0100: [I] VERBS RDMA library librdmacm.so unloaded. > 2017-09-02_07:46:01.391+0100: [E] VERBS RDMA failed to start, no valid > verbsPorts defined. > > > Anyone run into this before? I have another node imaged the *exact* same > way and no dice. Have tried a variety of drivers, cards, etc, same result > every time. > > Cheers, > Barry > > > > > > > This email is confidential in that it is intended for the exclusive > attention of the addressee(s) indicated. If you are not the intended > recipient, this email should not be read or disclosed to any other person. > Please notify the sender immediately and delete this email from your > computer system. Any opinions expressed are not necessarily those of the > company from which this email was sent and, whilst to the best of our > knowledge no viruses or defects exist, no responsibility can be accepted > for any loss or damage arising from its receipt or subsequent use of this > email._______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Wed Sep 20 17:00:15 2017 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Wed, 20 Sep 2017 09:00:15 -0700 Subject: [gpfsug-discuss] User Meeting & SPXXL in NYC In-Reply-To: References: Message-ID: Thanks Doug. If you plan to go, *do register*. GPFS Day is free, but we need to know how many will attend. Register using the link on the HPCXXL event page below. Cheers, Kristy > On Sep 20, 2017, at 1:28 AM, Douglas O'flaherty wrote: > > > Reminder that the SPXXL day on IBM Spectrum Scale in New York is open to all. It is Thursday the 28th. There is also a Power day on Wednesday. > > > For more information > http://hpcxxl.org/summer-2017-meeting-september-24-29-new-york-city/ > > Doug > > Mobile > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Wed Sep 20 17:27:48 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 20 Sep 2017 16:27:48 +0000 Subject: [gpfsug-discuss] CCR cluster down for the count? In-Reply-To: <20170920114844.6bf9f27b@osc.edu> References: <20170920114844.6bf9f27b@osc.edu> Message-ID: <28D10363-A8F3-439B-81DB-EB0E4E750FFD@vanderbilt.edu> Hi Ed, Thanks for the suggestion ? that?s basically what I had done yesterday after Googling and getting a hit or two on the IBM DeveloperWorks site. I?m including some output below which seems to show that I?ve got everything set up but it?s still not working. Am I missing something? We don?t use CCR on our production cluster (and this experience doesn?t make me eager to do so!), so I?m not that familiar with it... Kevin /var/mmfs/gen root at testnsd2# mmdsh -F /tmp/cluster.hostnames "ps -ef | grep mmccr | grep -v grep" | sort testdellnode1: root 2583 1 0 May30 ? 00:10:33 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testdellnode1: root 6694 2583 0 11:19 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testgateway: root 2023 5828 0 11:19 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testgateway: root 5828 1 0 Sep18 ? 00:00:19 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd1: root 19356 4628 0 11:19 tty1 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd1: root 4628 1 0 Sep19 tty1 00:00:04 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd2: root 22149 2983 0 11:16 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd2: root 2983 1 0 Sep18 ? 00:00:27 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd3: root 15685 6557 0 11:19 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd3: root 6557 1 0 Sep19 ? 00:00:04 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testsched: root 29424 6512 0 11:19 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testsched: root 6512 1 0 Sep18 ? 00:00:20 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 /var/mmfs/gen root at testnsd2# mmstartup -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous error messages to determine cause. /var/mmfs/gen root at testnsd2# mmdsh -F /tmp/cluster.hostnames "ls -l /var/mmfs/ccr" | sort testdellnode1: drwxr-xr-x 2 root root 4096 Mar 3 2017 cached testdellnode1: drwxr-xr-x 2 root root 4096 Nov 10 2016 committed testdellnode1: -rw-r--r-- 1 root root 99 Nov 10 2016 ccr.nodes testdellnode1: total 12 testgateway: drwxr-xr-x. 2 root root 4096 Jun 29 2016 committed testgateway: drwxr-xr-x. 2 root root 4096 Mar 3 2017 cached testgateway: -rw-r--r--. 1 root root 99 Jun 29 2016 ccr.nodes testgateway: total 12 testnsd1: drwxr-xr-x 2 root root 6 Sep 19 15:38 cached testnsd1: drwxr-xr-x 2 root root 6 Sep 19 15:38 committed testnsd1: -rw-r--r-- 1 root root 0 Sep 19 15:39 ccr.disks testnsd1: -rw-r--r-- 1 root root 4 Sep 19 15:38 ccr.noauth testnsd1: -rw-r--r-- 1 root root 99 Sep 19 15:39 ccr.nodes testnsd1: total 8 testnsd2: drwxr-xr-x 2 root root 22 Mar 3 2017 cached testnsd2: drwxr-xr-x 2 root root 4096 Sep 18 11:49 committed testnsd2: -rw------- 1 root root 4096 Sep 18 11:50 ccr.paxos.1 testnsd2: -rw------- 1 root root 4096 Sep 18 11:50 ccr.paxos.2 testnsd2: -rw-r--r-- 1 root root 0 Jun 29 2016 ccr.disks testnsd2: -rw-r--r-- 1 root root 99 Jun 29 2016 ccr.nodes testnsd2: total 16 testnsd3: drwxr-xr-x 2 root root 6 Sep 19 15:41 cached testnsd3: drwxr-xr-x 2 root root 6 Sep 19 15:41 committed testnsd3: -rw-r--r-- 1 root root 0 Jun 29 2016 ccr.disks testnsd3: -rw-r--r-- 1 root root 4 Sep 19 15:41 ccr.noauth testnsd3: -rw-r--r-- 1 root root 99 Jun 29 2016 ccr.nodes testnsd3: total 8 testsched: drwxr-xr-x. 2 root root 4096 Jun 29 2016 committed testsched: drwxr-xr-x. 2 root root 4096 Mar 3 2017 cached testsched: -rw-r--r--. 1 root root 99 Jun 29 2016 ccr.nodes testsched: total 12 /var/mmfs/gen root at testnsd2# more ../ccr/ccr.nodes 3,0,10.0.6.215,,testnsd3.vampire 1,0,10.0.6.213,,testnsd1.vampire 2,0,10.0.6.214,,testnsd2.vampire /var/mmfs/gen root at testnsd2# mmdsh -F /tmp/cluster.hostnames "ls -l /var/mmfs/gen/mmsdrfs" testnsd1: -rw-r--r-- 1 root root 20360 Sep 19 15:21 /var/mmfs/gen/mmsdrfs testnsd3: -rw-r--r-- 1 root root 20360 Sep 19 15:34 /var/mmfs/gen/mmsdrfs testnsd2: -rw-r--r-- 1 root root 20360 Aug 25 17:34 /var/mmfs/gen/mmsdrfs testdellnode1: -rw-r--r-- 1 root root 20360 Aug 25 17:43 /var/mmfs/gen/mmsdrfs testgateway: -rw-r--r--. 1 root root 20360 Aug 25 17:43 /var/mmfs/gen/mmsdrfs testsched: -rw-r--r--. 1 root root 20360 Aug 25 17:43 /var/mmfs/gen/mmsdrfs /var/mmfs/gen root at testnsd2# mmdsh -F /tmp/cluster.hostnames "md5sum /var/mmfs/gen/mmsdrfs" testnsd1: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs testnsd3: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs testnsd2: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs testdellnode1: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs testgateway: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs testsched: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs /var/mmfs/gen root at testnsd2# mmdsh -F /tmp/cluster.hostnames "md5sum /var/mmfs/ssl/stage/genkeyData1" testnsd3: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 testnsd1: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 testnsd2: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 testdellnode1: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 testgateway: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 testsched: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 /var/mmfs/gen root at testnsd2# On Sep 20, 2017, at 10:48 AM, Edward Wahl > wrote: I've run into this before. We didn't use to use CCR. And restoring nodes for us is a major pain in the rear as we only allow one-way root SSH, so we have a number of useful little scripts to work around problems like this. Assuming that you have all the necessary files copied to the correct places, you can manually kick off CCR. I think my script does something like: (copy the encryption key info) scp /var/mmfs/ccr/ccr.nodes :/var/mmfs/ccr/ scp /var/mmfs/gen/mmsdrfs :/var/mmfs/gen/ scp /var/mmfs/ssl/stage/genkeyData1 :/var/mmfs/ssl/stage/ :/usr/lpp/mmfs/bin/mmcommon startCcrMonitor you should then see like 2 copies of it running under mmksh. Ed On Wed, 20 Sep 2017 13:55:28 +0000 "Buterbaugh, Kevin L" > wrote: Hi All, testnsd1 and testnsd3 both had hardware issues (power supply and internal HD respectively). Given that they were 12 year old boxes, we decided to replace them with other boxes that are a mere 7 years old ? keep in mind that this is a test cluster. Disabling CCR does not work, even with the undocumented ??force? option: /var/mmfs/gen root at testnsd2# mmchcluster --ccr-disable -p testnsd2 -s testnsd1 --force mmchcluster: Unable to obtain the GPFS configuration file lock. mmchcluster: GPFS was unable to obtain a lock from node testnsd1.vampire. mmchcluster: Processing continues without lock protection. The authenticity of host 'testnsd3.vampire (10.0.6.215)' can't be established. ECDSA key fingerprint is SHA256:Ky1pkjsC/kvt4RA8PJuEh/W3vcxCJZplr2m1XHr+UwI. ECDSA key fingerprint is MD5:55:59:a0:2a:6e:a1:00:58:85:3d:ac:86:0e:cd:2a:8a. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'testnsd1.vampire (10.0.6.213)' can't be established. ECDSA key fingerprint is SHA256:WPiTtyuyzhuv+lRRpgDjLuHpyHyk/W3+c5N9SabWvnE. ECDSA key fingerprint is MD5:26:26:2a:bf:e4:cb:1d:a8:27:35:96:ef:b5:96:e0:29. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp609.vampire (10.0.21.9)' can't be established. ECDSA key fingerprint is SHA256:/gX6eSp/shsRboVFcUFcNCtGSfbBIWQZ/CWjA6gb17Q. ECDSA key fingerprint is MD5:ca:4d:58:8c:91:28:25:7b:5b:b1:0d:a3:72:a3:00:bb. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp608.vampire (10.0.21.8)' can't be established. ECDSA key fingerprint is SHA256:tvtNWN9b7/Qknb/Am8x7FzyMngi6R3f5SHBqATNtLzw. ECDSA key fingerprint is MD5:fc:4e:87:fb:09:82:cd:67:b0:7d:7f:c7:4b:83:b9:6c. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp612.vampire (10.0.21.12)' can't be established. ECDSA key fingerprint is SHA256:zKXqPt8rIMZWSAYavKEuaAVIm31OGVovoWVU+dBTRPM. ECDSA key fingerprint is MD5:72:4d:fb:22:4e:b3:0e:04:37:be:16:74:ae:ea:05:6c. Are you sure you want to continue connecting (yes/no)? root at vmp610.vampire's password: testnsd3.vampire: Host key verification failed. mmdsh: testnsd3.vampire remote shell process had return code 255. testnsd1.vampire: Host key verification failed. mmdsh: testnsd1.vampire remote shell process had return code 255. vmp609.vampire: Host key verification failed. mmdsh: vmp609.vampire remote shell process had return code 255. vmp608.vampire: Host key verification failed. mmdsh: vmp608.vampire remote shell process had return code 255. vmp612.vampire: Host key verification failed. mmdsh: vmp612.vampire remote shell process had return code 255. root at vmp610.vampire's password: vmp610.vampire: Permission denied, please try again. root at vmp610.vampire's password: vmp610.vampire: Permission denied, please try again. vmp610.vampire: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password). mmdsh: vmp610.vampire remote shell process had return code 255. Verifying GPFS is stopped on all nodes ... The authenticity of host 'testnsd3.vampire (10.0.6.215)' can't be established. ECDSA key fingerprint is SHA256:Ky1pkjsC/kvt4RA8PJuEh/W3vcxCJZplr2m1XHr+UwI. ECDSA key fingerprint is MD5:55:59:a0:2a:6e:a1:00:58:85:3d:ac:86:0e:cd:2a:8a. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp612.vampire (10.0.21.12)' can't be established. ECDSA key fingerprint is SHA256:zKXqPt8rIMZWSAYavKEuaAVIm31OGVovoWVU+dBTRPM. ECDSA key fingerprint is MD5:72:4d:fb:22:4e:b3:0e:04:37:be:16:74:ae:ea:05:6c. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp608.vampire (10.0.21.8)' can't be established. ECDSA key fingerprint is SHA256:tvtNWN9b7/Qknb/Am8x7FzyMngi6R3f5SHBqATNtLzw. ECDSA key fingerprint is MD5:fc:4e:87:fb:09:82:cd:67:b0:7d:7f:c7:4b:83:b9:6c. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp609.vampire (10.0.21.9)' can't be established. ECDSA key fingerprint is SHA256:/gX6eSp/shsRboVFcUFcNCtGSfbBIWQZ/CWjA6gb17Q. ECDSA key fingerprint is MD5:ca:4d:58:8c:91:28:25:7b:5b:b1:0d:a3:72:a3:00:bb. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'testnsd1.vampire (10.0.6.213)' can't be established. ECDSA key fingerprint is SHA256:WPiTtyuyzhuv+lRRpgDjLuHpyHyk/W3+c5N9SabWvnE. ECDSA key fingerprint is MD5:26:26:2a:bf:e4:cb:1d:a8:27:35:96:ef:b5:96:e0:29. Are you sure you want to continue connecting (yes/no)? root at vmp610.vampire's password: root at vmp610.vampire's password: root at vmp610.vampire's password: testnsd3.vampire: Host key verification failed. mmdsh: testnsd3.vampire remote shell process had return code 255. vmp612.vampire: Host key verification failed. mmdsh: vmp612.vampire remote shell process had return code 255. vmp608.vampire: Host key verification failed. mmdsh: vmp608.vampire remote shell process had return code 255. vmp609.vampire: Host key verification failed. mmdsh: vmp609.vampire remote shell process had return code 255. testnsd1.vampire: Host key verification failed. mmdsh: testnsd1.vampire remote shell process had return code 255. vmp610.vampire: Permission denied, please try again. vmp610.vampire: Permission denied, please try again. vmp610.vampire: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password). mmdsh: vmp610.vampire remote shell process had return code 255. mmchcluster: Command failed. Examine previous error messages to determine cause. /var/mmfs/gen root at testnsd2# I believe that part of the problem may be that there are 4 client nodes that were removed from the cluster without removing them from the cluster (done by another SysAdmin who was in a hurry to repurpose those machines). They?re up and pingable but not reachable by GPFS anymore, which I?m pretty sure is making things worse. Nor does Loic?s suggestion of running mmcommon work (but thanks for the suggestion!) ? actually the mmcommon part worked, but a subsequent attempt to start the cluster up failed: /var/mmfs/gen root at testnsd2# mmstartup -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous error messages to determine cause. /var/mmfs/gen root at testnsd2# Thanks. Kevin On Sep 19, 2017, at 10:07 PM, IBM Spectrum Scale > wrote: Hi Kevin, Let's me try to understand the problem you have. What's the meaning of node died here. Are you mean that there are some hardware/OS issue which cannot be fixed and OS cannot be up anymore? I agree with Bob that you can have a try to disable CCR temporally, restore cluster configuration and enable it again. Such as: 1. Login to a node which has proper GPFS config, e.g NodeA 2. Shutdown daemon in all client cluster. 3. mmchcluster --ccr-disable -p NodeA 4. mmsdrrestore -a -p NodeA 5. mmauth genkey propagate -N testnsd1, testnsd3 6. mmchcluster --ccr-enable Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fdeveloperworks%2Fcommunity%2Fforums%2Fhtml%2Fforum%3Fid%3D11111111-0000-0000-0000-000000000479&data=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C745cfeaac7264124bb8c08d5003f162a%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636415193316350738&sdata=8OL9COHsb4M%2BZOyWta92acdO8K1Ez8HJfHbrCdDsmRs%3D&reserved=0. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. "Oesterlin, Robert" ---09/20/2017 07:39:55 AM---OK ? I?ve run across this before, and it?s because of a bug (as I recall) having to do with CCR and From: "Oesterlin, Robert" > To: gpfsug main discussion list > Date: 09/20/2017 07:39 AM Subject: Re: [gpfsug-discuss] CCR cluster down for the count? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ OK ? I?ve run across this before, and it?s because of a bug (as I recall) having to do with CCR and quorum. What I think you can do is set the cluster to non-ccr (mmchcluster ?ccr-disable) with all the nodes down, bring it back up and then re-enable ccr. I?ll see if I can find this in one of the recent 4.2 release nodes. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: > on behalf of "Buterbaugh, Kevin L" > Reply-To: gpfsug main discussion list > Date: Tuesday, September 19, 2017 at 4:03 PM To: gpfsug main discussion list > Subject: [EXTERNAL] [gpfsug-discuss] CCR cluster down for the count? Hi All, We have a small test cluster that is CCR enabled. It only had/has 3 NSD servers (testnsd1, 2, and 3) and maybe 3-6 clients. testnsd3 died a while back. I did nothing about it at the time because it was due to be life-cycled as soon as I finished a couple of higher priority projects. Yesterday, testnsd1 also died, which took the whole cluster down. So now resolving this has become higher priority? ;-) I took two other boxes and set them up as testnsd1 and 3, respectively. I?ve done a ?mmsdrrestore -p testnsd2 -R /usr/bin/scp? on both of them. I?ve also done a "mmccr setup -F? and copied the ccr.disks and ccr.nodes files from testnsd2 to them. And I?ve copied /var/mmfs/gen/mmsdrfs from testnsd2 to testnsd1 and 3. In case it?s not obvious from the above, networking is fine ? ssh without a password between those 3 boxes is fine. However, when I try to startup GPFS ? or run any GPFS command I get: /root root at testnsd2# mmstartup -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous error messages to determine cause. /root root at testnsd2# I?ve got to run to a meeting right now, so I hope I?m not leaving out any crucial details here ? does anyone have an idea what I need to do? Thanks? ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttp-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss%26d%3DDwICAg%26c%3Djf_iaSHvJObTbx-siA1ZOg%26r%3DIbxtjdkPAM2Sbon4Lbbi4w%26m%3DmBSa534LB4C2zN59ZsJSlginQqfcrutinpAPYNDqU_Y%26s%3DYJEapknqzE2d9kwZzZuu6gEW0DzBoM-o94pXGEeCfuI%26e&data=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C745cfeaac7264124bb8c08d5003f162a%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636415193316350738&sdata=oQ4u%2BdyyYLY7HzaOqRPEGjUVhi7AQF%2BvbvnWA4bhuXE%3D&reserved=0= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C494f0469ec084568b39608d4ffd4b8c2%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636414736486816768&sdata=kBvEL7Kp2JMGuLIL4NX3UV7h3emaayQSbHr8O1F2CXc%3D&reserved=0 -- Ed Wahl Ohio Supercomputer Center 614-292-9302 ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From stijn.deweirdt at ugent.be Wed Sep 20 18:48:26 2017 From: stijn.deweirdt at ugent.be (Stijn De Weirdt) Date: Wed, 20 Sep 2017 19:48:26 +0200 Subject: [gpfsug-discuss] CCR cluster down for the count? In-Reply-To: <28D10363-A8F3-439B-81DB-EB0E4E750FFD@vanderbilt.edu> References: <20170920114844.6bf9f27b@osc.edu> <28D10363-A8F3-439B-81DB-EB0E4E750FFD@vanderbilt.edu> Message-ID: <1f0b2657-8ca3-7b35-95f3-7c4edb6c0818@ugent.be> hi kevin, we were hit by similar issue when we did something not so smart: we had a 5 node quorum, and we wanted to replace 1 test node with 3 more production quorum node. we however first removed the test node, and then with 4 quorum nodes we did mmshutdown for some other config modifications. when we tried to start it, we hit the same "Not enough CCR quorum nodes available" errors. also, none of the ccr commands were helpful; they also hanged, even simple ones like show etc etc. what we did in the end was the following (and some try-and-error): from the /var/adm/ras/mmsdrserv.log logfiles we guessed that we had some sort of split brain paxos cluster (some reported " ccrd: recovery complete (rc 809)", some same message with 'rc 0' and some didn't have the recovery complete on the last line(s)) * stop ccr everywhere mmshutdown -a mmdsh -N all pkill -9 -f mmccr * one by one, start the paxos cluster using mmshutdown on the quorum nodes (mmshutdown will start ccr and there is no unit or something to help with that). * the nodes will join after 3-4 minutes and report "recovery complete"; wait for it before you start another one * the trial-and-error part was that sometimes there was recovery complete with rc=809, sometimes with rc=0. in the end, once they all had same rc=0, paxos was happy again and eg mmlsconfig worked again. this left a very bad experience with CCR with us, but we want to use ces, so no real alternative (and to be honest, with odd number of quorum, we saw no more issues, everyting was smooth). in particular we were missing * unit files for all extra services that gpfs launched (mmccrmoniotr, mmsysmon); so we can monitor and start/stop them cleanly * ccr commands that work with broken paxos setup; eg to report that the paxos cluster is broken or operating in some split-brain mode. anyway, YMMV and good luck. stijn On 09/20/2017 06:27 PM, Buterbaugh, Kevin L wrote: > Hi Ed, > > Thanks for the suggestion ? that?s basically what I had done yesterday after Googling and getting a hit or two on the IBM DeveloperWorks site. I?m including some output below which seems to show that I?ve got everything set up but it?s still not working. > > Am I missing something? We don?t use CCR on our production cluster (and this experience doesn?t make me eager to do so!), so I?m not that familiar with it... > > Kevin > > /var/mmfs/gen > root at testnsd2# mmdsh -F /tmp/cluster.hostnames "ps -ef | grep mmccr | grep -v grep" | sort > testdellnode1: root 2583 1 0 May30 ? 00:10:33 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testdellnode1: root 6694 2583 0 11:19 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testgateway: root 2023 5828 0 11:19 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testgateway: root 5828 1 0 Sep18 ? 00:00:19 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testnsd1: root 19356 4628 0 11:19 tty1 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testnsd1: root 4628 1 0 Sep19 tty1 00:00:04 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testnsd2: root 22149 2983 0 11:16 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testnsd2: root 2983 1 0 Sep18 ? 00:00:27 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testnsd3: root 15685 6557 0 11:19 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testnsd3: root 6557 1 0 Sep19 ? 00:00:04 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testsched: root 29424 6512 0 11:19 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testsched: root 6512 1 0 Sep18 ? 00:00:20 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > /var/mmfs/gen > root at testnsd2# mmstartup -a > get file failed: Not enough CCR quorum nodes available (err 809) > gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 > mmstartup: Command failed. Examine previous error messages to determine cause. > /var/mmfs/gen > root at testnsd2# mmdsh -F /tmp/cluster.hostnames "ls -l /var/mmfs/ccr" | sort > testdellnode1: drwxr-xr-x 2 root root 4096 Mar 3 2017 cached > testdellnode1: drwxr-xr-x 2 root root 4096 Nov 10 2016 committed > testdellnode1: -rw-r--r-- 1 root root 99 Nov 10 2016 ccr.nodes > testdellnode1: total 12 > testgateway: drwxr-xr-x. 2 root root 4096 Jun 29 2016 committed > testgateway: drwxr-xr-x. 2 root root 4096 Mar 3 2017 cached > testgateway: -rw-r--r--. 1 root root 99 Jun 29 2016 ccr.nodes > testgateway: total 12 > testnsd1: drwxr-xr-x 2 root root 6 Sep 19 15:38 cached > testnsd1: drwxr-xr-x 2 root root 6 Sep 19 15:38 committed > testnsd1: -rw-r--r-- 1 root root 0 Sep 19 15:39 ccr.disks > testnsd1: -rw-r--r-- 1 root root 4 Sep 19 15:38 ccr.noauth > testnsd1: -rw-r--r-- 1 root root 99 Sep 19 15:39 ccr.nodes > testnsd1: total 8 > testnsd2: drwxr-xr-x 2 root root 22 Mar 3 2017 cached > testnsd2: drwxr-xr-x 2 root root 4096 Sep 18 11:49 committed > testnsd2: -rw------- 1 root root 4096 Sep 18 11:50 ccr.paxos.1 > testnsd2: -rw------- 1 root root 4096 Sep 18 11:50 ccr.paxos.2 > testnsd2: -rw-r--r-- 1 root root 0 Jun 29 2016 ccr.disks > testnsd2: -rw-r--r-- 1 root root 99 Jun 29 2016 ccr.nodes > testnsd2: total 16 > testnsd3: drwxr-xr-x 2 root root 6 Sep 19 15:41 cached > testnsd3: drwxr-xr-x 2 root root 6 Sep 19 15:41 committed > testnsd3: -rw-r--r-- 1 root root 0 Jun 29 2016 ccr.disks > testnsd3: -rw-r--r-- 1 root root 4 Sep 19 15:41 ccr.noauth > testnsd3: -rw-r--r-- 1 root root 99 Jun 29 2016 ccr.nodes > testnsd3: total 8 > testsched: drwxr-xr-x. 2 root root 4096 Jun 29 2016 committed > testsched: drwxr-xr-x. 2 root root 4096 Mar 3 2017 cached > testsched: -rw-r--r--. 1 root root 99 Jun 29 2016 ccr.nodes > testsched: total 12 > /var/mmfs/gen > root at testnsd2# more ../ccr/ccr.nodes > 3,0,10.0.6.215,,testnsd3.vampire > 1,0,10.0.6.213,,testnsd1.vampire > 2,0,10.0.6.214,,testnsd2.vampire > /var/mmfs/gen > root at testnsd2# mmdsh -F /tmp/cluster.hostnames "ls -l /var/mmfs/gen/mmsdrfs" > testnsd1: -rw-r--r-- 1 root root 20360 Sep 19 15:21 /var/mmfs/gen/mmsdrfs > testnsd3: -rw-r--r-- 1 root root 20360 Sep 19 15:34 /var/mmfs/gen/mmsdrfs > testnsd2: -rw-r--r-- 1 root root 20360 Aug 25 17:34 /var/mmfs/gen/mmsdrfs > testdellnode1: -rw-r--r-- 1 root root 20360 Aug 25 17:43 /var/mmfs/gen/mmsdrfs > testgateway: -rw-r--r--. 1 root root 20360 Aug 25 17:43 /var/mmfs/gen/mmsdrfs > testsched: -rw-r--r--. 1 root root 20360 Aug 25 17:43 /var/mmfs/gen/mmsdrfs > /var/mmfs/gen > root at testnsd2# mmdsh -F /tmp/cluster.hostnames "md5sum /var/mmfs/gen/mmsdrfs" > testnsd1: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs > testnsd3: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs > testnsd2: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs > testdellnode1: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs > testgateway: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs > testsched: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs > /var/mmfs/gen > root at testnsd2# mmdsh -F /tmp/cluster.hostnames "md5sum /var/mmfs/ssl/stage/genkeyData1" > testnsd3: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 > testnsd1: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 > testnsd2: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 > testdellnode1: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 > testgateway: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 > testsched: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 > /var/mmfs/gen > root at testnsd2# > > On Sep 20, 2017, at 10:48 AM, Edward Wahl > wrote: > > I've run into this before. We didn't use to use CCR. And restoring nodes for > us is a major pain in the rear as we only allow one-way root SSH, so we have a > number of useful little scripts to work around problems like this. > > Assuming that you have all the necessary files copied to the correct > places, you can manually kick off CCR. > > I think my script does something like: > > (copy the encryption key info) > > scp /var/mmfs/ccr/ccr.nodes :/var/mmfs/ccr/ > > scp /var/mmfs/gen/mmsdrfs :/var/mmfs/gen/ > > scp /var/mmfs/ssl/stage/genkeyData1 :/var/mmfs/ssl/stage/ > > :/usr/lpp/mmfs/bin/mmcommon startCcrMonitor > > you should then see like 2 copies of it running under mmksh. > > Ed > > > On Wed, 20 Sep 2017 13:55:28 +0000 > "Buterbaugh, Kevin L" > wrote: > > Hi All, > > testnsd1 and testnsd3 both had hardware issues (power supply and internal HD > respectively). Given that they were 12 year old boxes, we decided to replace > them with other boxes that are a mere 7 years old ? keep in mind that this is > a test cluster. > > Disabling CCR does not work, even with the undocumented ??force? option: > > /var/mmfs/gen > root at testnsd2# mmchcluster --ccr-disable -p testnsd2 -s testnsd1 --force > mmchcluster: Unable to obtain the GPFS configuration file lock. > mmchcluster: GPFS was unable to obtain a lock from node testnsd1.vampire. > mmchcluster: Processing continues without lock protection. > The authenticity of host 'testnsd3.vampire (10.0.6.215)' can't be established. > ECDSA key fingerprint is SHA256:Ky1pkjsC/kvt4RA8PJuEh/W3vcxCJZplr2m1XHr+UwI. > ECDSA key fingerprint is MD5:55:59:a0:2a:6e:a1:00:58:85:3d:ac:86:0e:cd:2a:8a. > Are you sure you want to continue connecting (yes/no)? The authenticity of > host 'testnsd1.vampire (10.0.6.213)' can't be established. ECDSA key > fingerprint is SHA256:WPiTtyuyzhuv+lRRpgDjLuHpyHyk/W3+c5N9SabWvnE. ECDSA key > fingerprint is MD5:26:26:2a:bf:e4:cb:1d:a8:27:35:96:ef:b5:96:e0:29. Are you > sure you want to continue connecting (yes/no)? The authenticity of host > 'vmp609.vampire (10.0.21.9)' can't be established. ECDSA key fingerprint is > SHA256:/gX6eSp/shsRboVFcUFcNCtGSfbBIWQZ/CWjA6gb17Q. ECDSA key fingerprint is > MD5:ca:4d:58:8c:91:28:25:7b:5b:b1:0d:a3:72:a3:00:bb. Are you sure you want to > continue connecting (yes/no)? The authenticity of host 'vmp608.vampire > (10.0.21.8)' can't be established. ECDSA key fingerprint is > SHA256:tvtNWN9b7/Qknb/Am8x7FzyMngi6R3f5SHBqATNtLzw. ECDSA key fingerprint is > MD5:fc:4e:87:fb:09:82:cd:67:b0:7d:7f:c7:4b:83:b9:6c. Are you sure you want to > continue connecting (yes/no)? The authenticity of host 'vmp612.vampire > (10.0.21.12)' can't be established. ECDSA key fingerprint is > SHA256:zKXqPt8rIMZWSAYavKEuaAVIm31OGVovoWVU+dBTRPM. ECDSA key fingerprint is > MD5:72:4d:fb:22:4e:b3:0e:04:37:be:16:74:ae:ea:05:6c. Are you sure you want to > continue connecting (yes/no)? > root at vmp610.vampire's password: > testnsd3.vampire: Host key verification failed. mmdsh: testnsd3.vampire > remote shell process had return code 255. testnsd1.vampire: Host key > verification failed. mmdsh: testnsd1.vampire remote shell process had return > code 255. vmp609.vampire: Host key verification failed. mmdsh: > vmp609.vampire remote shell process had return code 255. vmp608.vampire: > Host key verification failed. mmdsh: vmp608.vampire remote shell process had > return code 255. vmp612.vampire: Host key verification failed. mmdsh: > vmp612.vampire remote shell process had return code 255. > > root at vmp610.vampire's password: vmp610.vampire: > Permission denied, please try again. > > root at vmp610.vampire's password: vmp610.vampire: > Permission denied, please try again. > > vmp610.vampire: Permission denied > (publickey,gssapi-keyex,gssapi-with-mic,password). mmdsh: vmp610.vampire > remote shell process had return code 255. > > Verifying GPFS is stopped on all nodes ... > The authenticity of host 'testnsd3.vampire (10.0.6.215)' can't be established. > ECDSA key fingerprint is SHA256:Ky1pkjsC/kvt4RA8PJuEh/W3vcxCJZplr2m1XHr+UwI. > ECDSA key fingerprint is MD5:55:59:a0:2a:6e:a1:00:58:85:3d:ac:86:0e:cd:2a:8a. > Are you sure you want to continue connecting (yes/no)? The authenticity of > host 'vmp612.vampire (10.0.21.12)' can't be established. ECDSA key > fingerprint is SHA256:zKXqPt8rIMZWSAYavKEuaAVIm31OGVovoWVU+dBTRPM. ECDSA key > fingerprint is MD5:72:4d:fb:22:4e:b3:0e:04:37:be:16:74:ae:ea:05:6c. Are you > sure you want to continue connecting (yes/no)? The authenticity of host > 'vmp608.vampire (10.0.21.8)' can't be established. ECDSA key fingerprint is > SHA256:tvtNWN9b7/Qknb/Am8x7FzyMngi6R3f5SHBqATNtLzw. ECDSA key fingerprint is > MD5:fc:4e:87:fb:09:82:cd:67:b0:7d:7f:c7:4b:83:b9:6c. Are you sure you want to > continue connecting (yes/no)? The authenticity of host 'vmp609.vampire > (10.0.21.9)' can't be established. ECDSA key fingerprint is > SHA256:/gX6eSp/shsRboVFcUFcNCtGSfbBIWQZ/CWjA6gb17Q. ECDSA key fingerprint is > MD5:ca:4d:58:8c:91:28:25:7b:5b:b1:0d:a3:72:a3:00:bb. Are you sure you want to > continue connecting (yes/no)? The authenticity of host 'testnsd1.vampire > (10.0.6.213)' can't be established. ECDSA key fingerprint is > SHA256:WPiTtyuyzhuv+lRRpgDjLuHpyHyk/W3+c5N9SabWvnE. ECDSA key fingerprint is > MD5:26:26:2a:bf:e4:cb:1d:a8:27:35:96:ef:b5:96:e0:29. Are you sure you want to > continue connecting (yes/no)? > root at vmp610.vampire's password: > root at vmp610.vampire's password: > root at vmp610.vampire's password: > > testnsd3.vampire: Host key verification failed. > mmdsh: testnsd3.vampire remote shell process had return code 255. > vmp612.vampire: Host key verification failed. > mmdsh: vmp612.vampire remote shell process had return code 255. > vmp608.vampire: Host key verification failed. > mmdsh: vmp608.vampire remote shell process had return code 255. > vmp609.vampire: Host key verification failed. > mmdsh: vmp609.vampire remote shell process had return code 255. > testnsd1.vampire: Host key verification failed. > mmdsh: testnsd1.vampire remote shell process had return code 255. > vmp610.vampire: Permission denied, please try again. > vmp610.vampire: Permission denied, please try again. > vmp610.vampire: Permission denied > (publickey,gssapi-keyex,gssapi-with-mic,password). mmdsh: vmp610.vampire > remote shell process had return code 255. mmchcluster: Command failed. > Examine previous error messages to determine cause. /var/mmfs/gen > root at testnsd2# > > I believe that part of the problem may be that there are 4 client nodes that > were removed from the cluster without removing them from the cluster (done by > another SysAdmin who was in a hurry to repurpose those machines). They?re up > and pingable but not reachable by GPFS anymore, which I?m pretty sure is > making things worse. > > Nor does Loic?s suggestion of running mmcommon work (but thanks for the > suggestion!) ? actually the mmcommon part worked, but a subsequent attempt to > start the cluster up failed: > > /var/mmfs/gen > root at testnsd2# mmstartup -a > get file failed: Not enough CCR quorum nodes available (err 809) > gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 > mmstartup: Command failed. Examine previous error messages to determine cause. > /var/mmfs/gen > root at testnsd2# > > Thanks. > > Kevin > > On Sep 19, 2017, at 10:07 PM, IBM Spectrum Scale > > wrote: > > > Hi Kevin, > > Let's me try to understand the problem you have. What's the meaning of node > died here. Are you mean that there are some hardware/OS issue which cannot be > fixed and OS cannot be up anymore? > > I agree with Bob that you can have a try to disable CCR temporally, restore > cluster configuration and enable it again. > > Such as: > > 1. Login to a node which has proper GPFS config, e.g NodeA > 2. Shutdown daemon in all client cluster. > 3. mmchcluster --ccr-disable -p NodeA > 4. mmsdrrestore -a -p NodeA > 5. mmauth genkey propagate -N testnsd1, testnsd3 > 6. mmchcluster --ccr-enable > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fdeveloperworks%2Fcommunity%2Fforums%2Fhtml%2Fforum%3Fid%3D11111111-0000-0000-0000-000000000479&data=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C745cfeaac7264124bb8c08d5003f162a%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636415193316350738&sdata=8OL9COHsb4M%2BZOyWta92acdO8K1Ez8HJfHbrCdDsmRs%3D&reserved=0. > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 in the United States or your local IBM Service Center in other > countries. > > The forum is informally monitored as time permits and should not be used for > priority messages to the Spectrum Scale (GPFS) team. > > "Oesterlin, Robert" ---09/20/2017 07:39:55 AM---OK ? I?ve run > across this before, and it?s because of a bug (as I recall) having to do with > CCR and > > From: "Oesterlin, Robert" > > To: gpfsug > main discussion list > > > Date: 09/20/2017 07:39 AM Subject: Re: [gpfsug-discuss] CCR cluster down for > the count? Sent by: > gpfsug-discuss-bounces at spectrumscale.org > > ________________________________ > > > > OK ? I?ve run across this before, and it?s because of a bug (as I recall) > having to do with CCR and quorum. What I think you can do is set the cluster > to non-ccr (mmchcluster ?ccr-disable) with all the nodes down, bring it back > up and then re-enable ccr. > > I?ll see if I can find this in one of the recent 4.2 release nodes. > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > > From: > > > on behalf of "Buterbaugh, Kevin L" > > > Reply-To: gpfsug main discussion list > > > Date: Tuesday, September 19, 2017 at 4:03 PM To: gpfsug main discussion list > > > Subject: [EXTERNAL] [gpfsug-discuss] CCR cluster down for the count? > > Hi All, > > We have a small test cluster that is CCR enabled. It only had/has 3 NSD > servers (testnsd1, 2, and 3) and maybe 3-6 clients. testnsd3 died a while > back. I did nothing about it at the time because it was due to be life-cycled > as soon as I finished a couple of higher priority projects. > > Yesterday, testnsd1 also died, which took the whole cluster down. So now > resolving this has become higher priority? ;-) > > I took two other boxes and set them up as testnsd1 and 3, respectively. I?ve > done a ?mmsdrrestore -p testnsd2 -R /usr/bin/scp? on both of them. I?ve also > done a "mmccr setup -F? and copied the ccr.disks and ccr.nodes files from > testnsd2 to them. And I?ve copied /var/mmfs/gen/mmsdrfs from testnsd2 to > testnsd1 and 3. In case it?s not obvious from the above, networking is fine ? > ssh without a password between those 3 boxes is fine. > > However, when I try to startup GPFS ? or run any GPFS command I get: > > /root > root at testnsd2# mmstartup -a > get file failed: Not enough CCR quorum nodes available (err 809) > gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 > mmstartup: Command failed. Examine previous error messages to determine cause. > /root > root at testnsd2# > > I?ve got to run to a meeting right now, so I hope I?m not leaving out any > crucial details here ? does anyone have an idea what I need to do? Thanks? > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu - > (615)875-9633 > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttp-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss%26d%3DDwICAg%26c%3Djf_iaSHvJObTbx-siA1ZOg%26r%3DIbxtjdkPAM2Sbon4Lbbi4w%26m%3DmBSa534LB4C2zN59ZsJSlginQqfcrutinpAPYNDqU_Y%26s%3DYJEapknqzE2d9kwZzZuu6gEW0DzBoM-o94pXGEeCfuI%26e&data=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C745cfeaac7264124bb8c08d5003f162a%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636415193316350738&sdata=oQ4u%2BdyyYLY7HzaOqRPEGjUVhi7AQF%2BvbvnWA4bhuXE%3D&reserved=0= > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C494f0469ec084568b39608d4ffd4b8c2%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636414736486816768&sdata=kBvEL7Kp2JMGuLIL4NX3UV7h3emaayQSbHr8O1F2CXc%3D&reserved=0 > > > > > -- > > Ed Wahl > Ohio Supercomputer Center > 614-292-9302 > > > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > From jonathon.anderson at colorado.edu Wed Sep 20 19:55:04 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Wed, 20 Sep 2017 18:55:04 +0000 Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication In-Reply-To: References: <27469.1500914134@turing-police.cc.vt.edu> , , Message-ID: I shouldn't need SMB for authentication if I'm only using userdefined authentication, though. ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Sobey, Richard A Sent: Wednesday, September 20, 2017 2:23:37 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication This sounded familiar to a problem I had to do with SMB and NFS. I've looked, and it's a different problem, but at the time I had this response. "That would be the case when Active Directory is configured for authentication. In that case the SMB service includes two aspects: One is the actual SMB file server, and the second one is the service for the Active Directory integration. Since NFS depends on authentication and id mapping services, it requires SMB to be running." I suspect the last paragraph is relevant in your case. HTH Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jonathon A Anderson Sent: 20 September 2017 06:13 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Returning to this thread cause I'm having the same issue as Ilan, above. I'm working on setting up CES in our environment after finally getting a blocking bugfix applied. I'm making it further now, but I'm getting an error when I try to create my export: --- [root at sgate2 ~]# mmnfs export add /gpfs/summit/scratch --client 'login*.rc.int.colorado.edu(rw,root_squash);dtn*.rc.int.colorado.edu(rw,root_squash)' mmcesfuncs.sh: Current authentication: none is invalid. This operation can not be completed without correct Authentication configuration. Configure authentication using: mmuserauth mmnfs export add: Command failed. Examine previous error messages to determine cause. --- When I try to configure mmuserauth, I get an error about not having SMB active; but I don't want to configure SMB, only NFS. --- [root at sgate2 ~]# /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. --- How can I configure NFS exports with mmnfs without having to enable SMB? ~jonathon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Varun Mittal3 Sent: Tuesday, July 25, 2017 9:44:24 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sorry a small typo: mmuserauth service create --data-access-method file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune [Inactive hide details for Varun Mittal3---26/07/2017 09:12:27 AM---Hi Did you try to run this command from a CES designated nod]Varun Mittal3---26/07/2017 09:12:27 AM---Hi Did you try to run this command from a CES designated node ? From: Varun Mittal3/India/IBM To: gpfsug main discussion list Date: 26/07/2017 09:12 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication ________________________________ Hi Did you try to run this command from a CES designated node ? If no, then try executing the command from a CES node: mmuserauth service create --data-access-type file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune [Inactive hide details for Ilan Schwarts ---25/07/2017 10:22:26 AM---Hi, While trying to add the userdefined auth, I receive err]Ilan Schwarts ---25/07/2017 10:22:26 AM---Hi, While trying to add the userdefined auth, I receive error that SMB From: Ilan Schwarts To: gpfsug main discussion list Date: 25/07/2017 10:22 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi, While trying to add the userdefined auth, I receive error that SMB service not enabled. I am currently working on a spectrum scale cluster, and i dont have the SMB package, I am waiting for it.. is there a way to export NFSv3 using the spectrum scale tools without SMB package ? [root at LH20-GPFS1 ~]# mmuserauth service create --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. I exported the NFS via /etc/exports and than ./exportfs -a .. It works fine, I was able to mount the gpfs export from another machine.. this was my work-around since the spectrum scale tools failed to export NFSv3 On Mon, Jul 24, 2017 at 7:35 PM, wrote: > On Mon, 24 Jul 2017 13:36:41 +0300, Ilan Schwarts said: >> Hi, >> I have gpfs with 2 Nodes (redhat). >> I am trying to create NFS share - So I would be able to mount and >> access it from another linux machine. > >> While trying to create NFS (I execute the following): >> [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "* >> Access_Type=RW,Protocols=3:4,Squash=no_root_squash)" > > You can get away with little to no authentication for NFSv3, but > not for NFSv4. Try with Protocols=3 only and > > mmuserauth service create --type userdefined > > that should get you Unix-y NFSv3 UID/GID support and "trust what the NFS > client tells you". This of course only works sanely if each NFS export is > only to a set of machines in the same administrative domain that manages their > UID/GIDs. Exporting to two sets of machines that don't coordinate their > UID/GID space is, of course, where hilarity and hijinks ensue.... > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://secure-web.cisco.com/1w-ldlm8bq5oYiMuHk7N1T32DW18VkxjnfkMWDjdpiBv1WJToz9PCO1zVyGvWIVP3-fNVoZ49ioTlOwQoRbyC_MjpoBPlD3jfpV_knuzViyRNZiyliSGH9rx5nGVvTLSPrjIwzvUIZDadCuNXgM_jYCVBE2RsDpg8o4LCjJv9QIZPbyHlKrkoQ0sNGXOZPYT7gxpo8sVjoxKQbOgQzkDnPMQoa2a8miTP19fLkB5HqV5cJv3U-Vs_qLtyJGVsrSgLu2wQoDMxymVwm5mcRWO6MYfl4_MtVXKzQRwQqemODDjSa5my7zl98vobN_ui-cRwCYeVbOwEd57CjaYRzKcBu6Dbd2TmGar7JUNWVtg1dZPTv6uothD6V4g0Q0MuXZsBICzfxbjXI9WlB3Tiu3ty0oxenYrM8yxE-Cl57VhmV4KlY18EHMFncfLtRkk9cTHtfrEjiXBROhCuvEeqhrYT6A/http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss > -- - Ilan Schwarts _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://secure-web.cisco.com/1w-ldlm8bq5oYiMuHk7N1T32DW18VkxjnfkMWDjdpiBv1WJToz9PCO1zVyGvWIVP3-fNVoZ49ioTlOwQoRbyC_MjpoBPlD3jfpV_knuzViyRNZiyliSGH9rx5nGVvTLSPrjIwzvUIZDadCuNXgM_jYCVBE2RsDpg8o4LCjJv9QIZPbyHlKrkoQ0sNGXOZPYT7gxpo8sVjoxKQbOgQzkDnPMQoa2a8miTP19fLkB5HqV5cJv3U-Vs_qLtyJGVsrSgLu2wQoDMxymVwm5mcRWO6MYfl4_MtVXKzQRwQqemODDjSa5my7zl98vobN_ui-cRwCYeVbOwEd57CjaYRzKcBu6Dbd2TmGar7JUNWVtg1dZPTv6uothD6V4g0Q0MuXZsBICzfxbjXI9WlB3Tiu3ty0oxenYrM8yxE-Cl57VhmV4KlY18EHMFncfLtRkk9cTHtfrEjiXBROhCuvEeqhrYT6A/http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://secure-web.cisco.com/1w-ldlm8bq5oYiMuHk7N1T32DW18VkxjnfkMWDjdpiBv1WJToz9PCO1zVyGvWIVP3-fNVoZ49ioTlOwQoRbyC_MjpoBPlD3jfpV_knuzViyRNZiyliSGH9rx5nGVvTLSPrjIwzvUIZDadCuNXgM_jYCVBE2RsDpg8o4LCjJv9QIZPbyHlKrkoQ0sNGXOZPYT7gxpo8sVjoxKQbOgQzkDnPMQoa2a8miTP19fLkB5HqV5cJv3U-Vs_qLtyJGVsrSgLu2wQoDMxymVwm5mcRWO6MYfl4_MtVXKzQRwQqemODDjSa5my7zl98vobN_ui-cRwCYeVbOwEd57CjaYRzKcBu6Dbd2TmGar7JUNWVtg1dZPTv6uothD6V4g0Q0MuXZsBICzfxbjXI9WlB3Tiu3ty0oxenYrM8yxE-Cl57VhmV4KlY18EHMFncfLtRkk9cTHtfrEjiXBROhCuvEeqhrYT6A/http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://secure-web.cisco.com/1w-ldlm8bq5oYiMuHk7N1T32DW18VkxjnfkMWDjdpiBv1WJToz9PCO1zVyGvWIVP3-fNVoZ49ioTlOwQoRbyC_MjpoBPlD3jfpV_knuzViyRNZiyliSGH9rx5nGVvTLSPrjIwzvUIZDadCuNXgM_jYCVBE2RsDpg8o4LCjJv9QIZPbyHlKrkoQ0sNGXOZPYT7gxpo8sVjoxKQbOgQzkDnPMQoa2a8miTP19fLkB5HqV5cJv3U-Vs_qLtyJGVsrSgLu2wQoDMxymVwm5mcRWO6MYfl4_MtVXKzQRwQqemODDjSa5my7zl98vobN_ui-cRwCYeVbOwEd57CjaYRzKcBu6Dbd2TmGar7JUNWVtg1dZPTv6uothD6V4g0Q0MuXZsBICzfxbjXI9WlB3Tiu3ty0oxenYrM8yxE-Cl57VhmV4KlY18EHMFncfLtRkk9cTHtfrEjiXBROhCuvEeqhrYT6A/http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss From ewahl at osc.edu Wed Sep 20 20:07:39 2017 From: ewahl at osc.edu (Edward Wahl) Date: Wed, 20 Sep 2017 15:07:39 -0400 Subject: [gpfsug-discuss] CCR cluster down for the count? In-Reply-To: <28D10363-A8F3-439B-81DB-EB0E4E750FFD@vanderbilt.edu> References: <20170920114844.6bf9f27b@osc.edu> <28D10363-A8F3-439B-81DB-EB0E4E750FFD@vanderbilt.edu> Message-ID: <20170920150739.39f0a4a0@osc.edu> So who was the ccrmaster before? What is/was the quorum config? (tiebreaker disks?) what does 'mmccr check' say? Have you set DEBUG=1 and tried mmstartup to see if it teases out any more info from the error? Ed On Wed, 20 Sep 2017 16:27:48 +0000 "Buterbaugh, Kevin L" wrote: > Hi Ed, > > Thanks for the suggestion ? that?s basically what I had done yesterday after > Googling and getting a hit or two on the IBM DeveloperWorks site. I?m > including some output below which seems to show that I?ve got everything set > up but it?s still not working. > > Am I missing something? We don?t use CCR on our production cluster (and this > experience doesn?t make me eager to do so!), so I?m not that familiar with > it... > > Kevin > > /var/mmfs/gen > root at testnsd2# mmdsh -F /tmp/cluster.hostnames "ps -ef | grep mmccr | grep -v > grep" | sort testdellnode1: root 2583 1 0 May30 ? > 00:10:33 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testdellnode1: root 6694 2583 0 11:19 ? > 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testgateway: root 2023 5828 0 11:19 ? > 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testgateway: root 5828 1 0 Sep18 ? > 00:00:19 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd1: > root 19356 4628 0 11:19 tty1 > 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd1: > root 4628 1 0 Sep19 tty1 > 00:00:04 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd2: > root 22149 2983 0 11:16 ? > 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd2: > root 2983 1 0 Sep18 ? > 00:00:27 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd3: > root 15685 6557 0 11:19 ? > 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd3: > root 6557 1 0 Sep19 ? > 00:00:04 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testsched: root 29424 6512 0 11:19 ? > 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testsched: root 6512 1 0 Sep18 ? > 00:00:20 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor > 15 /var/mmfs/gen root at testnsd2# mmstartup -a get file failed: Not enough CCR > quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr > fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous > error messages to determine cause. /var/mmfs/gen root at testnsd2# mmdsh > -F /tmp/cluster.hostnames "ls -l /var/mmfs/ccr" | sort testdellnode1: > drwxr-xr-x 2 root root 4096 Mar 3 2017 cached testdellnode1: drwxr-xr-x 2 > root root 4096 Nov 10 2016 committed testdellnode1: -rw-r--r-- 1 root > root 99 Nov 10 2016 ccr.nodes testdellnode1: total 12 testgateway: > drwxr-xr-x. 2 root root 4096 Jun 29 2016 committed testgateway: drwxr-xr-x. > 2 root root 4096 Mar 3 2017 cached testgateway: -rw-r--r--. 1 root root > 99 Jun 29 2016 ccr.nodes testgateway: total 12 testnsd1: drwxr-xr-x 2 root > root 6 Sep 19 15:38 cached testnsd1: drwxr-xr-x 2 root root 6 Sep 19 15:38 > committed testnsd1: -rw-r--r-- 1 root root 0 Sep 19 15:39 ccr.disks > testnsd1: -rw-r--r-- 1 root root 4 Sep 19 15:38 ccr.noauth testnsd1: > -rw-r--r-- 1 root root 99 Sep 19 15:39 ccr.nodes testnsd1: total 8 > testnsd2: drwxr-xr-x 2 root root 22 Mar 3 2017 cached testnsd2: > drwxr-xr-x 2 root root 4096 Sep 18 11:49 committed testnsd2: -rw------- 1 > root root 4096 Sep 18 11:50 ccr.paxos.1 testnsd2: -rw------- 1 root root > 4096 Sep 18 11:50 ccr.paxos.2 testnsd2: -rw-r--r-- 1 root root 0 Jun 29 > 2016 ccr.disks testnsd2: -rw-r--r-- 1 root root 99 Jun 29 2016 ccr.nodes > testnsd2: total 16 testnsd3: drwxr-xr-x 2 root root 6 Sep 19 15:41 cached > testnsd3: drwxr-xr-x 2 root root 6 Sep 19 15:41 committed testnsd3: > -rw-r--r-- 1 root root 0 Jun 29 2016 ccr.disks testnsd3: -rw-r--r-- 1 root > root 4 Sep 19 15:41 ccr.noauth testnsd3: -rw-r--r-- 1 root root 99 Jun 29 > 2016 ccr.nodes testnsd3: total 8 testsched: drwxr-xr-x. 2 root root 4096 > Jun 29 2016 committed testsched: drwxr-xr-x. 2 root root 4096 Mar 3 2017 > cached testsched: -rw-r--r--. 1 root root 99 Jun 29 2016 ccr.nodes > testsched: total 12 /var/mmfs/gen root at testnsd2# more ../ccr/ccr.nodes > 3,0,10.0.6.215,,testnsd3.vampire > 1,0,10.0.6.213,,testnsd1.vampire > 2,0,10.0.6.214,,testnsd2.vampire > /var/mmfs/gen > root at testnsd2# mmdsh -F /tmp/cluster.hostnames "ls -l /var/mmfs/gen/mmsdrfs" > testnsd1: -rw-r--r-- 1 root root 20360 Sep 19 15:21 /var/mmfs/gen/mmsdrfs > testnsd3: -rw-r--r-- 1 root root 20360 Sep 19 15:34 /var/mmfs/gen/mmsdrfs > testnsd2: -rw-r--r-- 1 root root 20360 Aug 25 17:34 /var/mmfs/gen/mmsdrfs > testdellnode1: -rw-r--r-- 1 root root 20360 Aug 25 > 17:43 /var/mmfs/gen/mmsdrfs testgateway: -rw-r--r--. 1 root root 20360 Aug > 25 17:43 /var/mmfs/gen/mmsdrfs testsched: -rw-r--r--. 1 root root 20360 Aug > 25 17:43 /var/mmfs/gen/mmsdrfs /var/mmfs/gen > root at testnsd2# mmdsh -F /tmp/cluster.hostnames "md5sum /var/mmfs/gen/mmsdrfs" > testnsd1: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs > testnsd3: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs > testnsd2: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs > testdellnode1: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs > testgateway: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs > testsched: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs > /var/mmfs/gen > root at testnsd2# mmdsh -F /tmp/cluster.hostnames > "md5sum /var/mmfs/ssl/stage/genkeyData1" testnsd3: > ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 testnsd1: > ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 testnsd2: > ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 > testdellnode1: > ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 > testgateway: > ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 testsched: > ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 /var/mmfs/gen > root at testnsd2# > > On Sep 20, 2017, at 10:48 AM, Edward Wahl > > wrote: > > I've run into this before. We didn't use to use CCR. And restoring nodes for > us is a major pain in the rear as we only allow one-way root SSH, so we have a > number of useful little scripts to work around problems like this. > > Assuming that you have all the necessary files copied to the correct > places, you can manually kick off CCR. > > I think my script does something like: > > (copy the encryption key info) > > scp /var/mmfs/ccr/ccr.nodes :/var/mmfs/ccr/ > > scp /var/mmfs/gen/mmsdrfs :/var/mmfs/gen/ > > scp /var/mmfs/ssl/stage/genkeyData1 :/var/mmfs/ssl/stage/ > > :/usr/lpp/mmfs/bin/mmcommon startCcrMonitor > > you should then see like 2 copies of it running under mmksh. > > Ed > > > On Wed, 20 Sep 2017 13:55:28 +0000 > "Buterbaugh, Kevin L" > > > wrote: > > Hi All, > > testnsd1 and testnsd3 both had hardware issues (power supply and internal HD > respectively). Given that they were 12 year old boxes, we decided to replace > them with other boxes that are a mere 7 years old ? keep in mind that this is > a test cluster. > > Disabling CCR does not work, even with the undocumented ??force? option: > > /var/mmfs/gen > root at testnsd2# mmchcluster --ccr-disable -p testnsd2 -s testnsd1 --force > mmchcluster: Unable to obtain the GPFS configuration file lock. > mmchcluster: GPFS was unable to obtain a lock from node testnsd1.vampire. > mmchcluster: Processing continues without lock protection. > The authenticity of host 'testnsd3.vampire (10.0.6.215)' can't be established. > ECDSA key fingerprint is SHA256:Ky1pkjsC/kvt4RA8PJuEh/W3vcxCJZplr2m1XHr+UwI. > ECDSA key fingerprint is MD5:55:59:a0:2a:6e:a1:00:58:85:3d:ac:86:0e:cd:2a:8a. > Are you sure you want to continue connecting (yes/no)? The authenticity of > host 'testnsd1.vampire (10.0.6.213)' can't be established. ECDSA key > fingerprint is SHA256:WPiTtyuyzhuv+lRRpgDjLuHpyHyk/W3+c5N9SabWvnE. ECDSA key > fingerprint is MD5:26:26:2a:bf:e4:cb:1d:a8:27:35:96:ef:b5:96:e0:29. Are you > sure you want to continue connecting (yes/no)? The authenticity of host > 'vmp609.vampire (10.0.21.9)' can't be established. ECDSA key fingerprint is > SHA256:/gX6eSp/shsRboVFcUFcNCtGSfbBIWQZ/CWjA6gb17Q. ECDSA key fingerprint is > MD5:ca:4d:58:8c:91:28:25:7b:5b:b1:0d:a3:72:a3:00:bb. Are you sure you want to > continue connecting (yes/no)? The authenticity of host 'vmp608.vampire > (10.0.21.8)' can't be established. ECDSA key fingerprint is > SHA256:tvtNWN9b7/Qknb/Am8x7FzyMngi6R3f5SHBqATNtLzw. ECDSA key fingerprint is > MD5:fc:4e:87:fb:09:82:cd:67:b0:7d:7f:c7:4b:83:b9:6c. Are you sure you want to > continue connecting (yes/no)? The authenticity of host 'vmp612.vampire > (10.0.21.12)' can't be established. ECDSA key fingerprint is > SHA256:zKXqPt8rIMZWSAYavKEuaAVIm31OGVovoWVU+dBTRPM. ECDSA key fingerprint is > MD5:72:4d:fb:22:4e:b3:0e:04:37:be:16:74:ae:ea:05:6c. Are you sure you want to > continue connecting (yes/no)? > root at vmp610.vampire's > password: testnsd3.vampire: Host key verification failed. mmdsh: > testnsd3.vampire remote shell process had return code 255. testnsd1.vampire: > Host key verification failed. mmdsh: testnsd1.vampire remote shell process > had return code 255. vmp609.vampire: Host key verification failed. mmdsh: > vmp609.vampire remote shell process had return code 255. vmp608.vampire: > Host key verification failed. mmdsh: vmp608.vampire remote shell process had > return code 255. vmp612.vampire: Host key verification failed. mmdsh: > vmp612.vampire remote shell process had return code 255. > > root at vmp610.vampire's > password: vmp610.vampire: Permission denied, please try again. > > root at vmp610.vampire's > password: vmp610.vampire: Permission denied, please try again. > > vmp610.vampire: Permission denied > (publickey,gssapi-keyex,gssapi-with-mic,password). mmdsh: vmp610.vampire > remote shell process had return code 255. > > Verifying GPFS is stopped on all nodes ... > The authenticity of host 'testnsd3.vampire (10.0.6.215)' can't be established. > ECDSA key fingerprint is SHA256:Ky1pkjsC/kvt4RA8PJuEh/W3vcxCJZplr2m1XHr+UwI. > ECDSA key fingerprint is MD5:55:59:a0:2a:6e:a1:00:58:85:3d:ac:86:0e:cd:2a:8a. > Are you sure you want to continue connecting (yes/no)? The authenticity of > host 'vmp612.vampire (10.0.21.12)' can't be established. ECDSA key > fingerprint is SHA256:zKXqPt8rIMZWSAYavKEuaAVIm31OGVovoWVU+dBTRPM. ECDSA key > fingerprint is MD5:72:4d:fb:22:4e:b3:0e:04:37:be:16:74:ae:ea:05:6c. Are you > sure you want to continue connecting (yes/no)? The authenticity of host > 'vmp608.vampire (10.0.21.8)' can't be established. ECDSA key fingerprint is > SHA256:tvtNWN9b7/Qknb/Am8x7FzyMngi6R3f5SHBqATNtLzw. ECDSA key fingerprint is > MD5:fc:4e:87:fb:09:82:cd:67:b0:7d:7f:c7:4b:83:b9:6c. Are you sure you want to > continue connecting (yes/no)? The authenticity of host 'vmp609.vampire > (10.0.21.9)' can't be established. ECDSA key fingerprint is > SHA256:/gX6eSp/shsRboVFcUFcNCtGSfbBIWQZ/CWjA6gb17Q. ECDSA key fingerprint is > MD5:ca:4d:58:8c:91:28:25:7b:5b:b1:0d:a3:72:a3:00:bb. Are you sure you want to > continue connecting (yes/no)? The authenticity of host 'testnsd1.vampire > (10.0.6.213)' can't be established. ECDSA key fingerprint is > SHA256:WPiTtyuyzhuv+lRRpgDjLuHpyHyk/W3+c5N9SabWvnE. ECDSA key fingerprint is > MD5:26:26:2a:bf:e4:cb:1d:a8:27:35:96:ef:b5:96:e0:29. Are you sure you want to > continue connecting (yes/no)? > root at vmp610.vampire's > password: > root at vmp610.vampire's > password: > root at vmp610.vampire's > password: > > testnsd3.vampire: Host key verification failed. > mmdsh: testnsd3.vampire remote shell process had return code 255. > vmp612.vampire: Host key verification failed. > mmdsh: vmp612.vampire remote shell process had return code 255. > vmp608.vampire: Host key verification failed. > mmdsh: vmp608.vampire remote shell process had return code 255. > vmp609.vampire: Host key verification failed. > mmdsh: vmp609.vampire remote shell process had return code 255. > testnsd1.vampire: Host key verification failed. > mmdsh: testnsd1.vampire remote shell process had return code 255. > vmp610.vampire: Permission denied, please try again. > vmp610.vampire: Permission denied, please try again. > vmp610.vampire: Permission denied > (publickey,gssapi-keyex,gssapi-with-mic,password). mmdsh: vmp610.vampire > remote shell process had return code 255. mmchcluster: Command failed. > Examine previous error messages to determine cause. /var/mmfs/gen > root at testnsd2# > > I believe that part of the problem may be that there are 4 client nodes that > were removed from the cluster without removing them from the cluster (done by > another SysAdmin who was in a hurry to repurpose those machines). They?re up > and pingable but not reachable by GPFS anymore, which I?m pretty sure is > making things worse. > > Nor does Loic?s suggestion of running mmcommon work (but thanks for the > suggestion!) ? actually the mmcommon part worked, but a subsequent attempt to > start the cluster up failed: > > /var/mmfs/gen > root at testnsd2# mmstartup -a > get file failed: Not enough CCR quorum nodes available (err 809) > gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 > mmstartup: Command failed. Examine previous error messages to determine cause. > /var/mmfs/gen > root at testnsd2# > > Thanks. > > Kevin > > On Sep 19, 2017, at 10:07 PM, IBM Spectrum Scale > > wrote: > > > Hi Kevin, > > Let's me try to understand the problem you have. What's the meaning of node > died here. Are you mean that there are some hardware/OS issue which cannot be > fixed and OS cannot be up anymore? > > I agree with Bob that you can have a try to disable CCR temporally, restore > cluster configuration and enable it again. > > Such as: > > 1. Login to a node which has proper GPFS config, e.g NodeA > 2. Shutdown daemon in all client cluster. > 3. mmchcluster --ccr-disable -p NodeA > 4. mmsdrrestore -a -p NodeA > 5. mmauth genkey propagate -N testnsd1, testnsd3 > 6. mmchcluster --ccr-enable > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fdeveloperworks%2Fcommunity%2Fforums%2Fhtml%2Fforum%3Fid%3D11111111-0000-0000-0000-000000000479&data=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C745cfeaac7264124bb8c08d5003f162a%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636415193316350738&sdata=8OL9COHsb4M%2BZOyWta92acdO8K1Ez8HJfHbrCdDsmRs%3D&reserved=0. > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 in the United States or your local IBM Service Center in other > countries. > > The forum is informally monitored as time permits and should not be used for > priority messages to the Spectrum Scale (GPFS) team. > > "Oesterlin, Robert" ---09/20/2017 07:39:55 AM---OK ? I?ve run > across this before, and it?s because of a bug (as I recall) having to do with > CCR and > > From: "Oesterlin, Robert" > > > To: gpfsug main discussion list > > > Date: 09/20/2017 07:39 AM Subject: Re: [gpfsug-discuss] CCR cluster down for > the count? Sent by: > gpfsug-discuss-bounces at spectrumscale.org > > ________________________________ > > > > OK ? I?ve run across this before, and it?s because of a bug (as I recall) > having to do with CCR and quorum. What I think you can do is set the cluster > to non-ccr (mmchcluster ?ccr-disable) with all the nodes down, bring it back > up and then re-enable ccr. > > I?ll see if I can find this in one of the recent 4.2 release nodes. > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > > From: > > > on behalf of "Buterbaugh, Kevin L" > > > Reply-To: gpfsug main discussion list > > > Date: Tuesday, September 19, 2017 at 4:03 PM To: gpfsug main discussion list > > > Subject: [EXTERNAL] [gpfsug-discuss] CCR cluster down for the count? > > Hi All, > > We have a small test cluster that is CCR enabled. It only had/has 3 NSD > servers (testnsd1, 2, and 3) and maybe 3-6 clients. testnsd3 died a while > back. I did nothing about it at the time because it was due to be life-cycled > as soon as I finished a couple of higher priority projects. > > Yesterday, testnsd1 also died, which took the whole cluster down. So now > resolving this has become higher priority? ;-) > > I took two other boxes and set them up as testnsd1 and 3, respectively. I?ve > done a ?mmsdrrestore -p testnsd2 -R /usr/bin/scp? on both of them. I?ve also > done a "mmccr setup -F? and copied the ccr.disks and ccr.nodes files from > testnsd2 to them. And I?ve copied /var/mmfs/gen/mmsdrfs from testnsd2 to > testnsd1 and 3. In case it?s not obvious from the above, networking is fine ? > ssh without a password between those 3 boxes is fine. > > However, when I try to startup GPFS ? or run any GPFS command I get: > > /root > root at testnsd2# mmstartup -a > get file failed: Not enough CCR quorum nodes available (err 809) > gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 > mmstartup: Command failed. Examine previous error messages to determine cause. > /root > root at testnsd2# > > I?ve got to run to a meeting right now, so I hope I?m not leaving out any > crucial details here ? does anyone have an idea what I need to do? Thanks? > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu > - (615)875-9633 > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at > spectrumscale.org > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttp-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss%26d%3DDwICAg%26c%3Djf_iaSHvJObTbx-siA1ZOg%26r%3DIbxtjdkPAM2Sbon4Lbbi4w%26m%3DmBSa534LB4C2zN59ZsJSlginQqfcrutinpAPYNDqU_Y%26s%3DYJEapknqzE2d9kwZzZuu6gEW0DzBoM-o94pXGEeCfuI%26e&data=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C745cfeaac7264124bb8c08d5003f162a%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636415193316350738&sdata=oQ4u%2BdyyYLY7HzaOqRPEGjUVhi7AQF%2BvbvnWA4bhuXE%3D&reserved=0= > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at > spectrumscale.org > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C494f0469ec084568b39608d4ffd4b8c2%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636414736486816768&sdata=kBvEL7Kp2JMGuLIL4NX3UV7h3emaayQSbHr8O1F2CXc%3D&reserved=0 > > > > > -- > > Ed Wahl > Ohio Supercomputer Center > 614-292-9302 > > > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu - > (615)875-9633 > > > -- Ed Wahl Ohio Supercomputer Center 614-292-9302 From tarak.patel at canada.ca Wed Sep 20 21:23:00 2017 From: tarak.patel at canada.ca (Patel, Tarak (SSC/SPC)) Date: Wed, 20 Sep 2017 20:23:00 +0000 Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication In-Reply-To: References: <27469.1500914134@turing-police.cc.vt.edu> , , Message-ID: Hi, Recently we deployed 3 sets of CES nodes where we are using LDAP for authentication service. We had to create a user in ldap which was used by 'mmuserauth service create' command. Note that SMB needs to be disabled ('mmces service disable smb') if not being used before issuing 'mmuserauth service create'. By default, CES deployment enables SMB (' spectrumscale config protocols'). Tarak -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jonathon A Anderson Sent: 20 September, 2017 14:55 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication I shouldn't need SMB for authentication if I'm only using userdefined authentication, though. ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Sobey, Richard A Sent: Wednesday, September 20, 2017 2:23:37 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication This sounded familiar to a problem I had to do with SMB and NFS. I've looked, and it's a different problem, but at the time I had this response. "That would be the case when Active Directory is configured for authentication. In that case the SMB service includes two aspects: One is the actual SMB file server, and the second one is the service for the Active Directory integration. Since NFS depends on authentication and id mapping services, it requires SMB to be running." I suspect the last paragraph is relevant in your case. HTH Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jonathon A Anderson Sent: 20 September 2017 06:13 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Returning to this thread cause I'm having the same issue as Ilan, above. I'm working on setting up CES in our environment after finally getting a blocking bugfix applied. I'm making it further now, but I'm getting an error when I try to create my export: --- [root at sgate2 ~]# mmnfs export add /gpfs/summit/scratch --client 'login*.rc.int.colorado.edu(rw,root_squash);dtn*.rc.int.colorado.edu(rw,root_squash)' mmcesfuncs.sh: Current authentication: none is invalid. This operation can not be completed without correct Authentication configuration. Configure authentication using: mmuserauth mmnfs export add: Command failed. Examine previous error messages to determine cause. --- When I try to configure mmuserauth, I get an error about not having SMB active; but I don't want to configure SMB, only NFS. --- [root at sgate2 ~]# /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. --- How can I configure NFS exports with mmnfs without having to enable SMB? ~jonathon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Varun Mittal3 Sent: Tuesday, July 25, 2017 9:44:24 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sorry a small typo: mmuserauth service create --data-access-method file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune [Inactive hide details for Varun Mittal3---26/07/2017 09:12:27 AM---Hi Did you try to run this command from a CES designated nod]Varun Mittal3---26/07/2017 09:12:27 AM---Hi Did you try to run this command from a CES designated node ? From: Varun Mittal3/India/IBM To: gpfsug main discussion list Date: 26/07/2017 09:12 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication ________________________________ Hi Did you try to run this command from a CES designated node ? If no, then try executing the command from a CES node: mmuserauth service create --data-access-type file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune [Inactive hide details for Ilan Schwarts ---25/07/2017 10:22:26 AM---Hi, While trying to add the userdefined auth, I receive err]Ilan Schwarts ---25/07/2017 10:22:26 AM---Hi, While trying to add the userdefined auth, I receive error that SMB From: Ilan Schwarts To: gpfsug main discussion list Date: 25/07/2017 10:22 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi, While trying to add the userdefined auth, I receive error that SMB service not enabled. I am currently working on a spectrum scale cluster, and i dont have the SMB package, I am waiting for it.. is there a way to export NFSv3 using the spectrum scale tools without SMB package ? [root at LH20-GPFS1 ~]# mmuserauth service create --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. I exported the NFS via /etc/exports and than ./exportfs -a .. It works fine, I was able to mount the gpfs export from another machine.. this was my work-around since the spectrum scale tools failed to export NFSv3 On Mon, Jul 24, 2017 at 7:35 PM, wrote: > On Mon, 24 Jul 2017 13:36:41 +0300, Ilan Schwarts said: >> Hi, >> I have gpfs with 2 Nodes (redhat). >> I am trying to create NFS share - So I would be able to mount and >> access it from another linux machine. > >> While trying to create NFS (I execute the following): >> [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "* >> Access_Type=RW,Protocols=3:4,Squash=no_root_squash)" > > You can get away with little to no authentication for NFSv3, but not > for NFSv4. Try with Protocols=3 only and > > mmuserauth service create --type userdefined > > that should get you Unix-y NFSv3 UID/GID support and "trust what the > NFS client tells you". This of course only works sanely if each NFS > export is only to a set of machines in the same administrative domain > that manages their UID/GIDs. Exporting to two sets of machines that > don't coordinate their UID/GID space is, of course, where hilarity and hijinks ensue.... > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://secure-web.cisco.com/1w-ldlm8bq5oYiMuHk7N1T32DW18VkxjnfkMWDjdpi > Bv1WJToz9PCO1zVyGvWIVP3-fNVoZ49ioTlOwQoRbyC_MjpoBPlD3jfpV_knuzViyRNZiy > liSGH9rx5nGVvTLSPrjIwzvUIZDadCuNXgM_jYCVBE2RsDpg8o4LCjJv9QIZPbyHlKrkoQ > 0sNGXOZPYT7gxpo8sVjoxKQbOgQzkDnPMQoa2a8miTP19fLkB5HqV5cJv3U-Vs_qLtyJGV > srSgLu2wQoDMxymVwm5mcRWO6MYfl4_MtVXKzQRwQqemODDjSa5my7zl98vobN_ui-cRwC > YeVbOwEd57CjaYRzKcBu6Dbd2TmGar7JUNWVtg1dZPTv6uothD6V4g0Q0MuXZsBICzfxbj > XI9WlB3Tiu3ty0oxenYrM8yxE-Cl57VhmV4KlY18EHMFncfLtRkk9cTHtfrEjiXBROhCuv > EeqhrYT6A/http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discus > s > -- - Ilan Schwarts _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://secure-web.cisco.com/1w-ldlm8bq5oYiMuHk7N1T32DW18VkxjnfkMWDjdpiBv1WJToz9PCO1zVyGvWIVP3-fNVoZ49ioTlOwQoRbyC_MjpoBPlD3jfpV_knuzViyRNZiyliSGH9rx5nGVvTLSPrjIwzvUIZDadCuNXgM_jYCVBE2RsDpg8o4LCjJv9QIZPbyHlKrkoQ0sNGXOZPYT7gxpo8sVjoxKQbOgQzkDnPMQoa2a8miTP19fLkB5HqV5cJv3U-Vs_qLtyJGVsrSgLu2wQoDMxymVwm5mcRWO6MYfl4_MtVXKzQRwQqemODDjSa5my7zl98vobN_ui-cRwCYeVbOwEd57CjaYRzKcBu6Dbd2TmGar7JUNWVtg1dZPTv6uothD6V4g0Q0MuXZsBICzfxbjXI9WlB3Tiu3ty0oxenYrM8yxE-Cl57VhmV4KlY18EHMFncfLtRkk9cTHtfrEjiXBROhCuvEeqhrYT6A/http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://secure-web.cisco.com/1w-ldlm8bq5oYiMuHk7N1T32DW18VkxjnfkMWDjdpiBv1WJToz9PCO1zVyGvWIVP3-fNVoZ49ioTlOwQoRbyC_MjpoBPlD3jfpV_knuzViyRNZiyliSGH9rx5nGVvTLSPrjIwzvUIZDadCuNXgM_jYCVBE2RsDpg8o4LCjJv9QIZPbyHlKrkoQ0sNGXOZPYT7gxpo8sVjoxKQbOgQzkDnPMQoa2a8miTP19fLkB5HqV5cJv3U-Vs_qLtyJGVsrSgLu2wQoDMxymVwm5mcRWO6MYfl4_MtVXKzQRwQqemODDjSa5my7zl98vobN_ui-cRwCYeVbOwEd57CjaYRzKcBu6Dbd2TmGar7JUNWVtg1dZPTv6uothD6V4g0Q0MuXZsBICzfxbjXI9WlB3Tiu3ty0oxenYrM8yxE-Cl57VhmV4KlY18EHMFncfLtRkk9cTHtfrEjiXBROhCuvEeqhrYT6A/http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://secure-web.cisco.com/1w-ldlm8bq5oYiMuHk7N1T32DW18VkxjnfkMWDjdpiBv1WJToz9PCO1zVyGvWIVP3-fNVoZ49ioTlOwQoRbyC_MjpoBPlD3jfpV_knuzViyRNZiyliSGH9rx5nGVvTLSPrjIwzvUIZDadCuNXgM_jYCVBE2RsDpg8o4LCjJv9QIZPbyHlKrkoQ0sNGXOZPYT7gxpo8sVjoxKQbOgQzkDnPMQoa2a8miTP19fLkB5HqV5cJv3U-Vs_qLtyJGVsrSgLu2wQoDMxymVwm5mcRWO6MYfl4_MtVXKzQRwQqemODDjSa5my7zl98vobN_ui-cRwCYeVbOwEd57CjaYRzKcBu6Dbd2TmGar7JUNWVtg1dZPTv6uothD6V4g0Q0MuXZsBICzfxbjXI9WlB3Tiu3ty0oxenYrM8yxE-Cl57VhmV4KlY18EHMFncfLtRkk9cTHtfrEjiXBROhCuvEeqhrYT6A/http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From chetkulk at in.ibm.com Thu Sep 21 06:33:53 2017 From: chetkulk at in.ibm.com (Chetan R Kulkarni) Date: Thu, 21 Sep 2017 11:03:53 +0530 Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication In-Reply-To: References: <27469.1500914134@turing-police.cc.vt.edu>, , Message-ID: Hi Jonathon, I can configure file userdefined authentication with only NFS enabled/running on my test setup (SMB was disabled). Please check if following steps help fix your issue: 1> remove existing file auth if any /usr/lpp/mmfs/bin/mmuserauth service remove --data-access-method file 2> disable smb service /usr/lpp/mmfs/bin/mmces service disable smb /usr/lpp/mmfs/bin/mmces service list -a 3> configure userdefined file auth /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type userdefined 4> if above fails retry mmuserauth in debug mode as below and please share error log /tmp/userdefined.log. Also share spectrum scale version you are running with. export DEBUG=1; /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type userdefined > /tmp/userdefined.log 2>&1; unset DEBUG /usr/lpp/mmfs/bin/mmdiag --version 5> if mmuserauth succeeds in step 3> above; you also need to correct your mmnfs cli command as below. You missed to type in Access_Type= and Squash= in client definition. mmnfs export add /gpfs/summit/scratch --client 'login*.rc.int.colorado.edu (Access_Type=rw,Squash=root_squash);dtn*.rc.int.colorado.edu (Access_Type=rw,Squash=root_squash)' Thanks, Chetan. From: Jonathon A Anderson To: gpfsug main discussion list Date: 09/21/2017 12:25 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sent by: gpfsug-discuss-bounces at spectrumscale.org I shouldn't need SMB for authentication if I'm only using userdefined authentication, though. ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Sobey, Richard A Sent: Wednesday, September 20, 2017 2:23:37 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication This sounded familiar to a problem I had to do with SMB and NFS. I've looked, and it's a different problem, but at the time I had this response. "That would be the case when Active Directory is configured for authentication. In that case the SMB service includes two aspects: One is the actual SMB file server, and the second one is the service for the Active Directory integration. Since NFS depends on authentication and id mapping services, it requires SMB to be running." I suspect the last paragraph is relevant in your case. HTH Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jonathon A Anderson Sent: 20 September 2017 06:13 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Returning to this thread cause I'm having the same issue as Ilan, above. I'm working on setting up CES in our environment after finally getting a blocking bugfix applied. I'm making it further now, but I'm getting an error when I try to create my export: --- [root at sgate2 ~]# mmnfs export add /gpfs/summit/scratch --client 'login*.rc.int.colorado.edu(rw,root_squash);dtn*.rc.int.colorado.edu (rw,root_squash)' mmcesfuncs.sh: Current authentication: none is invalid. This operation can not be completed without correct Authentication configuration. Configure authentication using: mmuserauth mmnfs export add: Command failed. Examine previous error messages to determine cause. --- When I try to configure mmuserauth, I get an error about not having SMB active; but I don't want to configure SMB, only NFS. --- [root at sgate2 ~]# /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. --- How can I configure NFS exports with mmnfs without having to enable SMB? ~jonathon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Varun Mittal3 Sent: Tuesday, July 25, 2017 9:44:24 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sorry a small typo: mmuserauth service create --data-access-method file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune [Inactive hide details for Varun Mittal3---26/07/2017 09:12:27 AM---Hi Did you try to run this command from a CES designated nod]Varun Mittal3---26/07/2017 09:12:27 AM---Hi Did you try to run this command from a CES designated node ? From: Varun Mittal3/India/IBM To: gpfsug main discussion list Date: 26/07/2017 09:12 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication ________________________________ Hi Did you try to run this command from a CES designated node ? If no, then try executing the command from a CES node: mmuserauth service create --data-access-type file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune [Inactive hide details for Ilan Schwarts ---25/07/2017 10:22:26 AM---Hi, While trying to add the userdefined auth, I receive err]Ilan Schwarts ---25/07/2017 10:22:26 AM---Hi, While trying to add the userdefined auth, I receive error that SMB From: Ilan Schwarts To: gpfsug main discussion list Date: 25/07/2017 10:22 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi, While trying to add the userdefined auth, I receive error that SMB service not enabled. I am currently working on a spectrum scale cluster, and i dont have the SMB package, I am waiting for it.. is there a way to export NFSv3 using the spectrum scale tools without SMB package ? [root at LH20-GPFS1 ~]# mmuserauth service create --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. I exported the NFS via /etc/exports and than ./exportfs -a .. It works fine, I was able to mount the gpfs export from another machine.. this was my work-around since the spectrum scale tools failed to export NFSv3 On Mon, Jul 24, 2017 at 7:35 PM, wrote: > On Mon, 24 Jul 2017 13:36:41 +0300, Ilan Schwarts said: >> Hi, >> I have gpfs with 2 Nodes (redhat). >> I am trying to create NFS share - So I would be able to mount and >> access it from another linux machine. > >> While trying to create NFS (I execute the following): >> [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "* >> Access_Type=RW,Protocols=3:4,Squash=no_root_squash)" > > You can get away with little to no authentication for NFSv3, but > not for NFSv4. Try with Protocols=3 only and > > mmuserauth service create --type userdefined > > that should get you Unix-y NFSv3 UID/GID support and "trust what the NFS > client tells you". This of course only works sanely if each NFS export is > only to a set of machines in the same administrative domain that manages their > UID/GIDs. Exporting to two sets of machines that don't coordinate their > UID/GID space is, of course, where hilarity and hijinks ensue.... > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__secure-2Dweb.cisco.com_1w-2Dldlm8bq5oYiMuHk7N1T32DW18VkxjnfkMWDjdpiBv1WJToz9PCO1zVyGvWIVP3-2DfNVoZ49ioTlOwQoRbyC-5FMjpoBPlD3jfpV-5FknuzViyRNZiyliSGH9rx5nGVvTLSPrjIwzvUIZDadCuNXgM-5FjYCVBE2RsDpg8o4LCjJv9QIZPbyHlKrkoQ0sNGXOZPYT7gxpo8sVjoxKQbOgQzkDnPMQoa2a8miTP19fLkB5HqV5cJv3U-2DVs-5FqLtyJGVsrSgLu2wQoDMxymVwm5mcRWO6MYfl4-5FMtVXKzQRwQqemODDjSa5my7zl98vobN-5Fui-2DcRwCYeVbOwEd57CjaYRzKcBu6Dbd2TmGar7JUNWVtg1dZPTv6uothD6V4g0Q0MuXZsBICzfxbjXI9WlB3Tiu3ty0oxenYrM8yxE-2DCl57VhmV4KlY18EHMFncfLtRkk9cTHtfrEjiXBROhCuvEeqhrYT6A_http-253A-252F-252Fgpfsug.org-252Fmailman-252Flistinfo-252Fgpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=uic-29lyJ5TCiTRi0FyznYhKJx5I7Vzu80WyYuZ4_iM&m=VqyIekg3Wtz0ukw-QSXsEXOoi5rZ0gnMeIPyFNGpllA&s=DNgplGZ30awqnvnd4Ju39pzv3rlk18Kf6NGe7iDX4Mk&e= > -- - Ilan Schwarts _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__secure-2Dweb.cisco.com_1w-2Dldlm8bq5oYiMuHk7N1T32DW18VkxjnfkMWDjdpiBv1WJToz9PCO1zVyGvWIVP3-2DfNVoZ49ioTlOwQoRbyC-5FMjpoBPlD3jfpV-5FknuzViyRNZiyliSGH9rx5nGVvTLSPrjIwzvUIZDadCuNXgM-5FjYCVBE2RsDpg8o4LCjJv9QIZPbyHlKrkoQ0sNGXOZPYT7gxpo8sVjoxKQbOgQzkDnPMQoa2a8miTP19fLkB5HqV5cJv3U-2DVs-5FqLtyJGVsrSgLu2wQoDMxymVwm5mcRWO6MYfl4-5FMtVXKzQRwQqemODDjSa5my7zl98vobN-5Fui-2DcRwCYeVbOwEd57CjaYRzKcBu6Dbd2TmGar7JUNWVtg1dZPTv6uothD6V4g0Q0MuXZsBICzfxbjXI9WlB3Tiu3ty0oxenYrM8yxE-2DCl57VhmV4KlY18EHMFncfLtRkk9cTHtfrEjiXBROhCuvEeqhrYT6A_http-253A-252F-252Fgpfsug.org-252Fmailman-252Flistinfo-252Fgpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=uic-29lyJ5TCiTRi0FyznYhKJx5I7Vzu80WyYuZ4_iM&m=VqyIekg3Wtz0ukw-QSXsEXOoi5rZ0gnMeIPyFNGpllA&s=DNgplGZ30awqnvnd4Ju39pzv3rlk18Kf6NGe7iDX4Mk&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__secure-2Dweb.cisco.com_1w-2Dldlm8bq5oYiMuHk7N1T32DW18VkxjnfkMWDjdpiBv1WJToz9PCO1zVyGvWIVP3-2DfNVoZ49ioTlOwQoRbyC-5FMjpoBPlD3jfpV-5FknuzViyRNZiyliSGH9rx5nGVvTLSPrjIwzvUIZDadCuNXgM-5FjYCVBE2RsDpg8o4LCjJv9QIZPbyHlKrkoQ0sNGXOZPYT7gxpo8sVjoxKQbOgQzkDnPMQoa2a8miTP19fLkB5HqV5cJv3U-2DVs-5FqLtyJGVsrSgLu2wQoDMxymVwm5mcRWO6MYfl4-5FMtVXKzQRwQqemODDjSa5my7zl98vobN-5Fui-2DcRwCYeVbOwEd57CjaYRzKcBu6Dbd2TmGar7JUNWVtg1dZPTv6uothD6V4g0Q0MuXZsBICzfxbjXI9WlB3Tiu3ty0oxenYrM8yxE-2DCl57VhmV4KlY18EHMFncfLtRkk9cTHtfrEjiXBROhCuvEeqhrYT6A_http-253A-252F-252Fgpfsug.org-252Fmailman-252Flistinfo-252Fgpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=uic-29lyJ5TCiTRi0FyznYhKJx5I7Vzu80WyYuZ4_iM&m=VqyIekg3Wtz0ukw-QSXsEXOoi5rZ0gnMeIPyFNGpllA&s=DNgplGZ30awqnvnd4Ju39pzv3rlk18Kf6NGe7iDX4Mk&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__secure-2Dweb.cisco.com_1w-2Dldlm8bq5oYiMuHk7N1T32DW18VkxjnfkMWDjdpiBv1WJToz9PCO1zVyGvWIVP3-2DfNVoZ49ioTlOwQoRbyC-5FMjpoBPlD3jfpV-5FknuzViyRNZiyliSGH9rx5nGVvTLSPrjIwzvUIZDadCuNXgM-5FjYCVBE2RsDpg8o4LCjJv9QIZPbyHlKrkoQ0sNGXOZPYT7gxpo8sVjoxKQbOgQzkDnPMQoa2a8miTP19fLkB5HqV5cJv3U-2DVs-5FqLtyJGVsrSgLu2wQoDMxymVwm5mcRWO6MYfl4-5FMtVXKzQRwQqemODDjSa5my7zl98vobN-5Fui-2DcRwCYeVbOwEd57CjaYRzKcBu6Dbd2TmGar7JUNWVtg1dZPTv6uothD6V4g0Q0MuXZsBICzfxbjXI9WlB3Tiu3ty0oxenYrM8yxE-2DCl57VhmV4KlY18EHMFncfLtRkk9cTHtfrEjiXBROhCuvEeqhrYT6A_http-253A-252F-252Fgpfsug.org-252Fmailman-252Flistinfo-252Fgpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=uic-29lyJ5TCiTRi0FyznYhKJx5I7Vzu80WyYuZ4_iM&m=VqyIekg3Wtz0ukw-QSXsEXOoi5rZ0gnMeIPyFNGpllA&s=DNgplGZ30awqnvnd4Ju39pzv3rlk18Kf6NGe7iDX4Mk&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=uic-29lyJ5TCiTRi0FyznYhKJx5I7Vzu80WyYuZ4_iM&m=VqyIekg3Wtz0ukw-QSXsEXOoi5rZ0gnMeIPyFNGpllA&s=AliY037R_W1y8Ym6nPI1XDP2yCq47JwtTPhj9IppwOM&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From andreas.mattsson at maxiv.lu.se Thu Sep 21 13:09:29 2017 From: andreas.mattsson at maxiv.lu.se (Andreas Mattsson) Date: Thu, 21 Sep 2017 12:09:29 +0000 Subject: [gpfsug-discuss] Strange timestamp behaviour on NFS via CES In-Reply-To: References: , Message-ID: <10541a8ed07149ecafdbe9ac03b807b8@maxiv.lu.se> Since I solved this old issue a long time ago, I'd thought I'd come back and report the solution in case someone else encounters similar problems in the future. Original problem reported by users: Copying files between folders on NFS exports from a CES server gave random timestamps on the files. Also, apart from the initial reported problem, there where issues where users sometimes couldn't change or delete files that they where owners of. Background: We have a Active Directory with RFC2307 posix attributes populated, and use the built in Winbind-based AD authentication with RFC2307 ID mapping of our Spectrum Scale CES protocol servers. All our Linux clients and servers are also AD integrated, using Nslcd and nss-pam-ldapd. Trigger: If a user was part of a AD group with a mixed case name, and this group gave access to a folder, and the NFS mount was done using NFSv4, the behavior in my original post occurred when copying or changing files in that folder. Cause: Active Directory handle LDAP-requests case insensitive, but results are returned with case retained. Winbind and SSSD-AD converts groups and usernames to lower case. Nslcd retains case. We run NFS with managed GIDs. Managed GIDs in NFSv3 seems to be handled case insensitive, or to ignore the actual group name after it has resolved the GID-number of the group, while NFSv4 seems to handle group names case sensitive and check the actual group name for certain operations even if the GID-number matches. Don't fully understand the mechanism behind why certain file operations would work but others not, but in essence a user would be part of a group called "UserGroup" with GID-number 1234 in AD and on the client, but would be part of a group called "usergroup" with GID-number 1234 on the CES server. Any operation that's authorized on the GID-number, or a case insensitive lookup of the group name, would work. Any operation authorized by a case sensitive group lookup would fail. Three different workarounds where found to work: 1. Rename groups and users to lower case in AD 2. Change from Nslcd to either SSSD or Winbind on the clients 3. Change from NFSv4 to NFSv3 when mounting NFS Remember to clear ID-mapping caches. Regards, Andreas ___________________________________ [https://mail.google.com/mail/u/0/?ui=2&ik=b0a6f02971&view=att&th=14618fab2daf0e10&attid=0.1.1&disp=emb&zw&atsh=1] Andreas Mattsson System Engineer MAX IV Laboratory Lund University Tel: +46-706-649544 E-mail: andreas.mattsson at maxlab.lu.se ________________________________ Fr?n: gpfsug-discuss-bounces at spectrumscale.org f?r Stephen Ulmer Skickat: den 3 februari 2017 14:35:21 Till: gpfsug main discussion list ?mne: Re: [gpfsug-discuss] Strange timestamp behaviour on NFS via CES Does the cp actually complete? As in, does it copy all of the blocks? What?s the exit code? A cp?d file should have ?new? metadata. That is, it should have it?s own dates, owners, etc. (not necessarily copied from the source file). I ran ?strace cp foo1 foo2?, and it was pretty instructive, maybe that would get you more info. On CentOS strace is in it?s own package, YMMV. -- Stephen On Feb 3, 2017, at 8:19 AM, Andreas Mattsson > wrote: That works. ?touch test100? Feb 3 14:16 test100 ?cp test100 test101? Feb 3 14:16 test100 Apr 21 2027 test101 ?touch ?r test100 test101? Feb 3 14:16 test100 Feb 3 14:16 test101 /Andreas That?s a cool one. :) What if you use the "random date" file as a time reference to touch another file (like, 'touch -r file02 file03?)? -- Stephen On Feb 3, 2017, at 7:46 AM, Andreas Mattsson > wrote: I?m having some really strange timestamp behaviour when doing file operations on NFS mounts shared via CES on spectrum scale 4.2.1.1 The NFS clients are up to date Centos and Debian machines. All Scale servers and NFS clients have correct date and time via NTP. Creating a file, for instance ?touch file00?, gives correct timestamp. Moving the file, ?mv file00 file01?, gives correct timestamp Copying the file, ?cp file01 file02?, gives a random timestamp anywhere in time, for instance Oct 12 2095 or Feb 29 1976 or something similar. This is only via NFS. Copying the file via a native gpfs-mount or via SMB gives a correct timestamp. Doing the same operation over NFS to other NFS-servers works correct, it is only when operating on the NFS-share from the Spectrum Scale CES the issue occurs. Have anyone seen this before? Regards, Andreas Mattsson _____________________________________________ Andreas Mattsson Systems Engineer MAX IV Laboratory Lund University P.O. Box 118, SE-221 00 Lund, Sweden Visiting address: Fotongatan 2, 225 94 Lund Mobile: +46 706 64 95 44 andreas.mattsson at maxiv.se www.maxiv.se _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From taylorm at us.ibm.com Thu Sep 21 15:33:00 2017 From: taylorm at us.ibm.com (Michael L Taylor) Date: Thu, 21 Sep 2017 07:33:00 -0700 Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication In-Reply-To: References: Message-ID: Hi Jonathon, We were able to run this scenario successfully in our lab at the latest released 4.2.3.4. # /usr/lpp/mmfs/bin/mmdiag --version === mmdiag: version === Current GPFS build: "4.2.3.4 ". # /usr/lpp/mmfs/bin/mmces service list -a Enabled services: NFS node1.test.ibm.com: NFS is running # /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type userdefined File authentication configuration completed successfully. # rpm -qa | grep gpfs gpfs.ext-4.2.3-4.x86_64 gpfs.docs-4.2.3-4.noarch gpfs.gskit-8.0.50-75.x86_64 gpfs.gpl-4.2.3-4.noarch gpfs.msg.en_US-4.2.3-4.noarch nfs-ganesha-gpfs-2.3.2-0.ibm47.el7.x86_64 gpfs.base-4.2.3-4.x86_64 # rpm -qa | grep nfs-gan nfs-ganesha-utils-2.3.2-0.ibm47.el7.x86_64 nfs-ganesha-2.3.2-0.ibm47.el7.x86_64 nfs-ganesha-gpfs-2.3.2-0.ibm47.el7.x86_64 From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 09/20/2017 12:07 PM Subject: gpfsug-discuss Digest, Vol 68, Issue 42 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=BpVUgvFT2Qwgw0hveEgQaHFwn2mjeQjeBrkXHX_aC0A&m=2oGcWc1xx6zOclryoU2BdJykABuIR118zXTmSAA8msU&s=7q0JMYVHMSGlUAYquNMlrDRF6BDj6-76Oc4VbXrvlHE&e= or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: export nfs share on gpfs with no authentication (Jonathon A Anderson) ---------------------------------------------------------------------- Message: 1 Date: Wed, 20 Sep 2017 18:55:04 +0000 From: Jonathon A Anderson To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Message-ID: Content-Type: text/plain; charset="us-ascii" I shouldn't need SMB for authentication if I'm only using userdefined authentication, though. ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Sobey, Richard A Sent: Wednesday, September 20, 2017 2:23:37 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication This sounded familiar to a problem I had to do with SMB and NFS. I've looked, and it's a different problem, but at the time I had this response. "That would be the case when Active Directory is configured for authentication. In that case the SMB service includes two aspects: One is the actual SMB file server, and the second one is the service for the Active Directory integration. Since NFS depends on authentication and id mapping services, it requires SMB to be running." I suspect the last paragraph is relevant in your case. HTH Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jonathon A Anderson Sent: 20 September 2017 06:13 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Returning to this thread cause I'm having the same issue as Ilan, above. I'm working on setting up CES in our environment after finally getting a blocking bugfix applied. I'm making it further now, but I'm getting an error when I try to create my export: --- [root at sgate2 ~]# mmnfs export add /gpfs/summit/scratch --client 'login*.rc.int.colorado.edu(rw,root_squash);dtn*.rc.int.colorado.edu (rw,root_squash)' mmcesfuncs.sh: Current authentication: none is invalid. This operation can not be completed without correct Authentication configuration. Configure authentication using: mmuserauth mmnfs export add: Command failed. Examine previous error messages to determine cause. --- When I try to configure mmuserauth, I get an error about not having SMB active; but I don't want to configure SMB, only NFS. --- [root at sgate2 ~]# /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. --- How can I configure NFS exports with mmnfs without having to enable SMB? ~jonathon -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Sep 21 18:09:52 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 21 Sep 2017 17:09:52 +0000 Subject: [gpfsug-discuss] CCR cluster down for the count? In-Reply-To: <20170920150739.39f0a4a0@osc.edu> References: <20170920114844.6bf9f27b@osc.edu> <28D10363-A8F3-439B-81DB-EB0E4E750FFD@vanderbilt.edu> <20170920150739.39f0a4a0@osc.edu> Message-ID: Hi All, Ralf Eberhard of IBM helped me resolve this off list. The key was to temporarily make testnsd1 and testnsd3 not be quorum nodes by making sure GPFS was down and then executing: mmchnode --nonquorum -N testnsd1,testnsd3 --force That gave me some scary messages about overriding normal GPFS quorum semantics, but nce that was done I was able to run an ?mmstartup -a? and bring up the cluster! Once it was up and I had verified things were working properly I then shut it back down so that I could rerun the mmchnode (without the ?force) to make testnsd1 and testnsd3 quorum nodes again. Thanks to all who helped me out here? Kevin On Sep 20, 2017, at 2:07 PM, Edward Wahl > wrote: So who was the ccrmaster before? What is/was the quorum config? (tiebreaker disks?) what does 'mmccr check' say? Have you set DEBUG=1 and tried mmstartup to see if it teases out any more info from the error? Ed On Wed, 20 Sep 2017 16:27:48 +0000 "Buterbaugh, Kevin L" > wrote: Hi Ed, Thanks for the suggestion ? that?s basically what I had done yesterday after Googling and getting a hit or two on the IBM DeveloperWorks site. I?m including some output below which seems to show that I?ve got everything set up but it?s still not working. Am I missing something? We don?t use CCR on our production cluster (and this experience doesn?t make me eager to do so!), so I?m not that familiar with it... Kevin /var/mmfs/gen root at testnsd2# mmdsh -F /tmp/cluster.hostnames "ps -ef | grep mmccr | grep -v grep" | sort testdellnode1: root 2583 1 0 May30 ? 00:10:33 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testdellnode1: root 6694 2583 0 11:19 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testgateway: root 2023 5828 0 11:19 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testgateway: root 5828 1 0 Sep18 ? 00:00:19 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd1: root 19356 4628 0 11:19 tty1 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd1: root 4628 1 0 Sep19 tty1 00:00:04 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd2: root 22149 2983 0 11:16 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd2: root 2983 1 0 Sep18 ? 00:00:27 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd3: root 15685 6557 0 11:19 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd3: root 6557 1 0 Sep19 ? 00:00:04 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testsched: root 29424 6512 0 11:19 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testsched: root 6512 1 0 Sep18 ? 00:00:20 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 /var/mmfs/gen root at testnsd2# mmstartup -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous error messages to determine cause. /var/mmfs/gen root at testnsd2# mmdsh -F /tmp/cluster.hostnames "ls -l /var/mmfs/ccr" | sort testdellnode1: drwxr-xr-x 2 root root 4096 Mar 3 2017 cached testdellnode1: drwxr-xr-x 2 root root 4096 Nov 10 2016 committed testdellnode1: -rw-r--r-- 1 root root 99 Nov 10 2016 ccr.nodes testdellnode1: total 12 testgateway: drwxr-xr-x. 2 root root 4096 Jun 29 2016 committed testgateway: drwxr-xr-x. 2 root root 4096 Mar 3 2017 cached testgateway: -rw-r--r--. 1 root root 99 Jun 29 2016 ccr.nodes testgateway: total 12 testnsd1: drwxr-xr-x 2 root root 6 Sep 19 15:38 cached testnsd1: drwxr-xr-x 2 root root 6 Sep 19 15:38 committed testnsd1: -rw-r--r-- 1 root root 0 Sep 19 15:39 ccr.disks testnsd1: -rw-r--r-- 1 root root 4 Sep 19 15:38 ccr.noauth testnsd1: -rw-r--r-- 1 root root 99 Sep 19 15:39 ccr.nodes testnsd1: total 8 testnsd2: drwxr-xr-x 2 root root 22 Mar 3 2017 cached testnsd2: drwxr-xr-x 2 root root 4096 Sep 18 11:49 committed testnsd2: -rw------- 1 root root 4096 Sep 18 11:50 ccr.paxos.1 testnsd2: -rw------- 1 root root 4096 Sep 18 11:50 ccr.paxos.2 testnsd2: -rw-r--r-- 1 root root 0 Jun 29 2016 ccr.disks testnsd2: -rw-r--r-- 1 root root 99 Jun 29 2016 ccr.nodes testnsd2: total 16 testnsd3: drwxr-xr-x 2 root root 6 Sep 19 15:41 cached testnsd3: drwxr-xr-x 2 root root 6 Sep 19 15:41 committed testnsd3: -rw-r--r-- 1 root root 0 Jun 29 2016 ccr.disks testnsd3: -rw-r--r-- 1 root root 4 Sep 19 15:41 ccr.noauth testnsd3: -rw-r--r-- 1 root root 99 Jun 29 2016 ccr.nodes testnsd3: total 8 testsched: drwxr-xr-x. 2 root root 4096 Jun 29 2016 committed testsched: drwxr-xr-x. 2 root root 4096 Mar 3 2017 cached testsched: -rw-r--r--. 1 root root 99 Jun 29 2016 ccr.nodes testsched: total 12 /var/mmfs/gen root at testnsd2# more ../ccr/ccr.nodes 3,0,10.0.6.215,,testnsd3.vampire 1,0,10.0.6.213,,testnsd1.vampire 2,0,10.0.6.214,,testnsd2.vampire /var/mmfs/gen root at testnsd2# mmdsh -F /tmp/cluster.hostnames "ls -l /var/mmfs/gen/mmsdrfs" testnsd1: -rw-r--r-- 1 root root 20360 Sep 19 15:21 /var/mmfs/gen/mmsdrfs testnsd3: -rw-r--r-- 1 root root 20360 Sep 19 15:34 /var/mmfs/gen/mmsdrfs testnsd2: -rw-r--r-- 1 root root 20360 Aug 25 17:34 /var/mmfs/gen/mmsdrfs testdellnode1: -rw-r--r-- 1 root root 20360 Aug 25 17:43 /var/mmfs/gen/mmsdrfs testgateway: -rw-r--r--. 1 root root 20360 Aug 25 17:43 /var/mmfs/gen/mmsdrfs testsched: -rw-r--r--. 1 root root 20360 Aug 25 17:43 /var/mmfs/gen/mmsdrfs /var/mmfs/gen root at testnsd2# mmdsh -F /tmp/cluster.hostnames "md5sum /var/mmfs/gen/mmsdrfs" testnsd1: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs testnsd3: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs testnsd2: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs testdellnode1: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs testgateway: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs testsched: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs /var/mmfs/gen root at testnsd2# mmdsh -F /tmp/cluster.hostnames "md5sum /var/mmfs/ssl/stage/genkeyData1" testnsd3: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 testnsd1: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 testnsd2: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 testdellnode1: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 testgateway: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 testsched: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 /var/mmfs/gen root at testnsd2# On Sep 20, 2017, at 10:48 AM, Edward Wahl > wrote: I've run into this before. We didn't use to use CCR. And restoring nodes for us is a major pain in the rear as we only allow one-way root SSH, so we have a number of useful little scripts to work around problems like this. Assuming that you have all the necessary files copied to the correct places, you can manually kick off CCR. I think my script does something like: (copy the encryption key info) scp /var/mmfs/ccr/ccr.nodes :/var/mmfs/ccr/ scp /var/mmfs/gen/mmsdrfs :/var/mmfs/gen/ scp /var/mmfs/ssl/stage/genkeyData1 :/var/mmfs/ssl/stage/ :/usr/lpp/mmfs/bin/mmcommon startCcrMonitor you should then see like 2 copies of it running under mmksh. Ed On Wed, 20 Sep 2017 13:55:28 +0000 "Buterbaugh, Kevin L" > wrote: Hi All, testnsd1 and testnsd3 both had hardware issues (power supply and internal HD respectively). Given that they were 12 year old boxes, we decided to replace them with other boxes that are a mere 7 years old ? keep in mind that this is a test cluster. Disabling CCR does not work, even with the undocumented ??force? option: /var/mmfs/gen root at testnsd2# mmchcluster --ccr-disable -p testnsd2 -s testnsd1 --force mmchcluster: Unable to obtain the GPFS configuration file lock. mmchcluster: GPFS was unable to obtain a lock from node testnsd1.vampire. mmchcluster: Processing continues without lock protection. The authenticity of host 'testnsd3.vampire (10.0.6.215)' can't be established. ECDSA key fingerprint is SHA256:Ky1pkjsC/kvt4RA8PJuEh/W3vcxCJZplr2m1XHr+UwI. ECDSA key fingerprint is MD5:55:59:a0:2a:6e:a1:00:58:85:3d:ac:86:0e:cd:2a:8a. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'testnsd1.vampire (10.0.6.213)' can't be established. ECDSA key fingerprint is SHA256:WPiTtyuyzhuv+lRRpgDjLuHpyHyk/W3+c5N9SabWvnE. ECDSA key fingerprint is MD5:26:26:2a:bf:e4:cb:1d:a8:27:35:96:ef:b5:96:e0:29. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp609.vampire (10.0.21.9)' can't be established. ECDSA key fingerprint is SHA256:/gX6eSp/shsRboVFcUFcNCtGSfbBIWQZ/CWjA6gb17Q. ECDSA key fingerprint is MD5:ca:4d:58:8c:91:28:25:7b:5b:b1:0d:a3:72:a3:00:bb. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp608.vampire (10.0.21.8)' can't be established. ECDSA key fingerprint is SHA256:tvtNWN9b7/Qknb/Am8x7FzyMngi6R3f5SHBqATNtLzw. ECDSA key fingerprint is MD5:fc:4e:87:fb:09:82:cd:67:b0:7d:7f:c7:4b:83:b9:6c. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp612.vampire (10.0.21.12)' can't be established. ECDSA key fingerprint is SHA256:zKXqPt8rIMZWSAYavKEuaAVIm31OGVovoWVU+dBTRPM. ECDSA key fingerprint is MD5:72:4d:fb:22:4e:b3:0e:04:37:be:16:74:ae:ea:05:6c. Are you sure you want to continue connecting (yes/no)? root at vmp610.vampire's password: testnsd3.vampire: Host key verification failed. mmdsh: testnsd3.vampire remote shell process had return code 255. testnsd1.vampire: Host key verification failed. mmdsh: testnsd1.vampire remote shell process had return code 255. vmp609.vampire: Host key verification failed. mmdsh: vmp609.vampire remote shell process had return code 255. vmp608.vampire: Host key verification failed. mmdsh: vmp608.vampire remote shell process had return code 255. vmp612.vampire: Host key verification failed. mmdsh: vmp612.vampire remote shell process had return code 255. root at vmp610.vampire's password: vmp610.vampire: Permission denied, please try again. root at vmp610.vampire's password: vmp610.vampire: Permission denied, please try again. vmp610.vampire: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password). mmdsh: vmp610.vampire remote shell process had return code 255. Verifying GPFS is stopped on all nodes ... The authenticity of host 'testnsd3.vampire (10.0.6.215)' can't be established. ECDSA key fingerprint is SHA256:Ky1pkjsC/kvt4RA8PJuEh/W3vcxCJZplr2m1XHr+UwI. ECDSA key fingerprint is MD5:55:59:a0:2a:6e:a1:00:58:85:3d:ac:86:0e:cd:2a:8a. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp612.vampire (10.0.21.12)' can't be established. ECDSA key fingerprint is SHA256:zKXqPt8rIMZWSAYavKEuaAVIm31OGVovoWVU+dBTRPM. ECDSA key fingerprint is MD5:72:4d:fb:22:4e:b3:0e:04:37:be:16:74:ae:ea:05:6c. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp608.vampire (10.0.21.8)' can't be established. ECDSA key fingerprint is SHA256:tvtNWN9b7/Qknb/Am8x7FzyMngi6R3f5SHBqATNtLzw. ECDSA key fingerprint is MD5:fc:4e:87:fb:09:82:cd:67:b0:7d:7f:c7:4b:83:b9:6c. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp609.vampire (10.0.21.9)' can't be established. ECDSA key fingerprint is SHA256:/gX6eSp/shsRboVFcUFcNCtGSfbBIWQZ/CWjA6gb17Q. ECDSA key fingerprint is MD5:ca:4d:58:8c:91:28:25:7b:5b:b1:0d:a3:72:a3:00:bb. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'testnsd1.vampire (10.0.6.213)' can't be established. ECDSA key fingerprint is SHA256:WPiTtyuyzhuv+lRRpgDjLuHpyHyk/W3+c5N9SabWvnE. ECDSA key fingerprint is MD5:26:26:2a:bf:e4:cb:1d:a8:27:35:96:ef:b5:96:e0:29. Are you sure you want to continue connecting (yes/no)? root at vmp610.vampire's password: root at vmp610.vampire's password: root at vmp610.vampire's password: testnsd3.vampire: Host key verification failed. mmdsh: testnsd3.vampire remote shell process had return code 255. vmp612.vampire: Host key verification failed. mmdsh: vmp612.vampire remote shell process had return code 255. vmp608.vampire: Host key verification failed. mmdsh: vmp608.vampire remote shell process had return code 255. vmp609.vampire: Host key verification failed. mmdsh: vmp609.vampire remote shell process had return code 255. testnsd1.vampire: Host key verification failed. mmdsh: testnsd1.vampire remote shell process had return code 255. vmp610.vampire: Permission denied, please try again. vmp610.vampire: Permission denied, please try again. vmp610.vampire: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password). mmdsh: vmp610.vampire remote shell process had return code 255. mmchcluster: Command failed. Examine previous error messages to determine cause. /var/mmfs/gen root at testnsd2# I believe that part of the problem may be that there are 4 client nodes that were removed from the cluster without removing them from the cluster (done by another SysAdmin who was in a hurry to repurpose those machines). They?re up and pingable but not reachable by GPFS anymore, which I?m pretty sure is making things worse. Nor does Loic?s suggestion of running mmcommon work (but thanks for the suggestion!) ? actually the mmcommon part worked, but a subsequent attempt to start the cluster up failed: /var/mmfs/gen root at testnsd2# mmstartup -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous error messages to determine cause. /var/mmfs/gen root at testnsd2# Thanks. Kevin On Sep 19, 2017, at 10:07 PM, IBM Spectrum Scale > wrote: Hi Kevin, Let's me try to understand the problem you have. What's the meaning of node died here. Are you mean that there are some hardware/OS issue which cannot be fixed and OS cannot be up anymore? I agree with Bob that you can have a try to disable CCR temporally, restore cluster configuration and enable it again. Such as: 1. Login to a node which has proper GPFS config, e.g NodeA 2. Shutdown daemon in all client cluster. 3. mmchcluster --ccr-disable -p NodeA 4. mmsdrrestore -a -p NodeA 5. mmauth genkey propagate -N testnsd1, testnsd3 6. mmchcluster --ccr-enable Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fdeveloperworks%2Fcommunity%2Fforums%2Fhtml%2Fforum%3Fid%3D11111111-0000-0000-0000-000000000479&data=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C745cfeaac7264124bb8c08d5003f162a%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636415193316350738&sdata=8OL9COHsb4M%2BZOyWta92acdO8K1Ez8HJfHbrCdDsmRs%3D&reserved=0. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. "Oesterlin, Robert" ---09/20/2017 07:39:55 AM---OK ? I?ve run across this before, and it?s because of a bug (as I recall) having to do with CCR and From: "Oesterlin, Robert" > To: gpfsug main discussion list > Date: 09/20/2017 07:39 AM Subject: Re: [gpfsug-discuss] CCR cluster down for the count? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ OK ? I?ve run across this before, and it?s because of a bug (as I recall) having to do with CCR and quorum. What I think you can do is set the cluster to non-ccr (mmchcluster ?ccr-disable) with all the nodes down, bring it back up and then re-enable ccr. I?ll see if I can find this in one of the recent 4.2 release nodes. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: > on behalf of "Buterbaugh, Kevin L" > Reply-To: gpfsug main discussion list > Date: Tuesday, September 19, 2017 at 4:03 PM To: gpfsug main discussion list > Subject: [EXTERNAL] [gpfsug-discuss] CCR cluster down for the count? Hi All, We have a small test cluster that is CCR enabled. It only had/has 3 NSD servers (testnsd1, 2, and 3) and maybe 3-6 clients. testnsd3 died a while back. I did nothing about it at the time because it was due to be life-cycled as soon as I finished a couple of higher priority projects. Yesterday, testnsd1 also died, which took the whole cluster down. So now resolving this has become higher priority? ;-) I took two other boxes and set them up as testnsd1 and 3, respectively. I?ve done a ?mmsdrrestore -p testnsd2 -R /usr/bin/scp? on both of them. I?ve also done a "mmccr setup -F? and copied the ccr.disks and ccr.nodes files from testnsd2 to them. And I?ve copied /var/mmfs/gen/mmsdrfs from testnsd2 to testnsd1 and 3. In case it?s not obvious from the above, networking is fine ? ssh without a password between those 3 boxes is fine. However, when I try to startup GPFS ? or run any GPFS command I get: /root root at testnsd2# mmstartup -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous error messages to determine cause. /root root at testnsd2# I?ve got to run to a meeting right now, so I hope I?m not leaving out any crucial details here ? does anyone have an idea what I need to do? Thanks? ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttp-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss%26d%3DDwICAg%26c%3Djf_iaSHvJObTbx-siA1ZOg%26r%3DIbxtjdkPAM2Sbon4Lbbi4w%26m%3DmBSa534LB4C2zN59ZsJSlginQqfcrutinpAPYNDqU_Y%26s%3DYJEapknqzE2d9kwZzZuu6gEW0DzBoM-o94pXGEeCfuI%26e&data=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C745cfeaac7264124bb8c08d5003f162a%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636415193316350738&sdata=oQ4u%2BdyyYLY7HzaOqRPEGjUVhi7AQF%2BvbvnWA4bhuXE%3D&reserved=0= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C494f0469ec084568b39608d4ffd4b8c2%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636414736486816768&sdata=kBvEL7Kp2JMGuLIL4NX3UV7h3emaayQSbHr8O1F2CXc%3D&reserved=0 -- Ed Wahl Ohio Supercomputer Center 614-292-9302 ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -- Ed Wahl Ohio Supercomputer Center 614-292-9302 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cfabfdb4659d249e2d20308d5005ae1ab%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636415312700069585&sdata=Z59ik0w%2BaK6bV2JsDxSNt%2FsqwR1ESuqkXTQVBlRjDgw%3D&reserved=0 ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Thu Sep 21 19:49:29 2017 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Thu, 21 Sep 2017 11:49:29 -0700 Subject: [gpfsug-discuss] User Meeting & SPXXL in NYC In-Reply-To: References: Message-ID: Registration space is getting tight. We decided on a room reconfiguration today to make a little more room. So if you tried to register and were told it was full try again. If it fills up again and you want to register, but can?t drop me an email and I?ll see what we can do. Best, Kristy > On Sep 20, 2017, at 9:00 AM, Kristy Kallback-Rose wrote: > > Thanks Doug. > > If you plan to go, *do register*. GPFS Day is free, but we need to know how many will attend. Register using the link on the HPCXXL event page below. > > Cheers, > Kristy > >> On Sep 20, 2017, at 1:28 AM, Douglas O'flaherty > wrote: >> >> >> Reminder that the SPXXL day on IBM Spectrum Scale in New York is open to all. It is Thursday the 28th. There is also a Power day on Wednesday. >> >> >> For more information >> http://hpcxxl.org/summer-2017-meeting-september-24-29-new-york-city/ >> >> Doug >> >> Mobile >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From christof.schmitt at us.ibm.com Fri Sep 22 23:08:58 2017 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Fri, 22 Sep 2017 22:08:58 +0000 Subject: [gpfsug-discuss] Strange timestamp behaviour on NFS via CES In-Reply-To: <10541a8ed07149ecafdbe9ac03b807b8@maxiv.lu.se> References: <10541a8ed07149ecafdbe9ac03b807b8@maxiv.lu.se>, , Message-ID: An HTML attachment was scrubbed... URL: From christof.schmitt at us.ibm.com Fri Sep 22 23:10:45 2017 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Fri, 22 Sep 2017 22:10:45 +0000 Subject: [gpfsug-discuss] Strange timestamp behaviour on NFS via CES In-Reply-To: References: , <10541a8ed07149ecafdbe9ac03b807b8@maxiv.lu.se>, , Message-ID: An HTML attachment was scrubbed... URL: From bipcuds at gmail.com Sun Sep 24 19:04:59 2017 From: bipcuds at gmail.com (Keith Ball) Date: Sun, 24 Sep 2017 14:04:59 -0400 Subject: [gpfsug-discuss] Experience with zimon database stability, and best practices for backup? Message-ID: Hello All, In a recent Spectrum Scale performance study, we used zimon/mmperfmon to gather metrics. During a period of 2 months, we ended up losing data twice from the zimon database; once after the virtual disk serving both the OS files and zimon collector and DB storage was resized, and a second time after an unknown event (the loss was discovered when plotting in Grafana only went back to a certain data and time; likewise, mmperfmon query output only went back to the same time). Details: - Spectrum Scale 4.2.1.1 (on NSD servers); 4.2.1.2 on the zimon collector node and other clients - Data retention in the "raw" stratum was set to 2 months; the "domains" settings were as follows (note that we did not hit the ceiling of 60GB (1GB/file * 60 files): domains = { # this is the raw domain aggregation = 0 # aggregation factor for the raw domain is always 0. ram = "12g" # amount of RAM to be used duration = "2m" # amount of time that data with the highest precision is kept. filesize = "1g" # maximum file size files = 60 # number of files. }, { # this is the first aggregation domain that aggregates to 10 seconds aggregation = 10 ram = "800m" # amount of RAM to be used duration = "6m" # keep aggregates for 1 week. filesize = "1g" # maximum file size files = 10 # number of files. }, { # this is the second aggregation domain that aggregates to 30*10 seconds == 5 minutes aggregation = 30 ram = "800m" # amount of RAM to be used duration = "1y" # keep averages for 2 months. filesize = "1g" # maximum file size files = 5 # number of files. }, { # this is the third aggregation domain that aggregates to 24*30*10 seconds == 2 hours aggregation = 24 ram = "800m" # amount of RAM to be used duration = "2y" # filesize = "1g" # maximum file size files = 5 # number of files. } Questions: 1.) Has anyone had similar issues with losing data from zimon? 2.) Are there known circumstances where data could be lost, e.g. changing the aggregation domain definitions, or even simply restarting the zimon collector? 3.) Does anyone have any "best practices" for backing up the zimon database? We were taking weekly "snapshots" by shutting down the collector, and making a tarball copy of the /opt/ibm/zimon directory (but the database corruption/data loss still crept through for various reasons). In terms of debugging, we do not have Scale or zimon logs going back to the suspected dates of data loss; we do have a gpfs.snap from about a month after the last data loss - would it have any useful clues? Opening a PMR could be tricky, as it was the customer who has the support entitlement, and the environment (specifically the old cluster definitino and the zimon collector VM) was torn down. Many Thanks, Keith -- Keith D. Ball, PhD RedLine Performance Solutions, LLC web: http://www.redlineperf.com/ email: kball at redlineperf.com cell: 540-557-7851 <%28540%29%20557-7851> -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Sun Sep 24 20:29:10 2017 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Sun, 24 Sep 2017 12:29:10 -0700 Subject: [gpfsug-discuss] Experience with zimon database stability, and best practices for backup? In-Reply-To: References: Message-ID: Hi Keith, We have barely begun with Zimon and have not (knock, knock) run up against any loss or corruption issues with Zimon. However, getting data out of Zimon for various reasons is something I have been thinking about. I'm interested partly because of the granularity that is lost over time like with any round robin style data collection scheme. So I guess one question is whether you have considered pulling the data out to another database, looked at the SS GUI which uses a postgres db (iirc, about to take off on a flight and can't check), or looked at the Grafana bridge which would get data into OpenTsdb format, again iirc. Anyway, just some things for consideration and a request to share back whatever you find out if it's off list. Thanks, getting stink eye to go to airplane mode. More later. Cheers Kristy On Sep 24, 2017 11:05 AM, "Keith Ball" wrote: Hello All, In a recent Spectrum Scale performance study, we used zimon/mmperfmon to gather metrics. During a period of 2 months, we ended up losing data twice from the zimon database; once after the virtual disk serving both the OS files and zimon collector and DB storage was resized, and a second time after an unknown event (the loss was discovered when plotting in Grafana only went back to a certain data and time; likewise, mmperfmon query output only went back to the same time). Details: - Spectrum Scale 4.2.1.1 (on NSD servers); 4.2.1.2 on the zimon collector node and other clients - Data retention in the "raw" stratum was set to 2 months; the "domains" settings were as follows (note that we did not hit the ceiling of 60GB (1GB/file * 60 files): domains = { # this is the raw domain aggregation = 0 # aggregation factor for the raw domain is always 0. ram = "12g" # amount of RAM to be used duration = "2m" # amount of time that data with the highest precision is kept. filesize = "1g" # maximum file size files = 60 # number of files. }, { # this is the first aggregation domain that aggregates to 10 seconds aggregation = 10 ram = "800m" # amount of RAM to be used duration = "6m" # keep aggregates for 1 week. filesize = "1g" # maximum file size files = 10 # number of files. }, { # this is the second aggregation domain that aggregates to 30*10 seconds == 5 minutes aggregation = 30 ram = "800m" # amount of RAM to be used duration = "1y" # keep averages for 2 months. filesize = "1g" # maximum file size files = 5 # number of files. }, { # this is the third aggregation domain that aggregates to 24*30*10 seconds == 2 hours aggregation = 24 ram = "800m" # amount of RAM to be used duration = "2y" # filesize = "1g" # maximum file size files = 5 # number of files. } Questions: 1.) Has anyone had similar issues with losing data from zimon? 2.) Are there known circumstances where data could be lost, e.g. changing the aggregation domain definitions, or even simply restarting the zimon collector? 3.) Does anyone have any "best practices" for backing up the zimon database? We were taking weekly "snapshots" by shutting down the collector, and making a tarball copy of the /opt/ibm/zimon directory (but the database corruption/data loss still crept through for various reasons). In terms of debugging, we do not have Scale or zimon logs going back to the suspected dates of data loss; we do have a gpfs.snap from about a month after the last data loss - would it have any useful clues? Opening a PMR could be tricky, as it was the customer who has the support entitlement, and the environment (specifically the old cluster definitino and the zimon collector VM) was torn down. Many Thanks, Keith -- Keith D. Ball, PhD RedLine Performance Solutions, LLC web: http://www.redlineperf.com/ email: kball at redlineperf.com cell: 540-557-7851 <%28540%29%20557-7851> _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From rkomandu at in.ibm.com Mon Sep 25 06:26:15 2017 From: rkomandu at in.ibm.com (Ravi K Komanduri) Date: Mon, 25 Sep 2017 10:56:15 +0530 Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication In-Reply-To: References: Message-ID: Jonathon, This requires SMB service when you are at 422 PTF2. As Mike pointed out if you upgrade to the 4.2.3-3/4 build you will no longer hit that issue With Regards, Ravi K Komanduri Email:rkomandu at in.ibm.com From: "Michael L Taylor" To: gpfsug-discuss at spectrumscale.org Date: 09/21/2017 08:03 PM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Jonathon, We were able to run this scenario successfully in our lab at the latest released 4.2.3.4. # /usr/lpp/mmfs/bin/mmdiag --version === mmdiag: version === Current GPFS build: "4.2.3.4 ". # /usr/lpp/mmfs/bin/mmces service list -a Enabled services: NFS node1.test.ibm.com: NFS is running # /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type userdefined File authentication configuration completed successfully. # rpm -qa | grep gpfs gpfs.ext-4.2.3-4.x86_64 gpfs.docs-4.2.3-4.noarch gpfs.gskit-8.0.50-75.x86_64 gpfs.gpl-4.2.3-4.noarch gpfs.msg.en_US-4.2.3-4.noarch nfs-ganesha-gpfs-2.3.2-0.ibm47.el7.x86_64 gpfs.base-4.2.3-4.x86_64 # rpm -qa | grep nfs-gan nfs-ganesha-utils-2.3.2-0.ibm47.el7.x86_64 nfs-ganesha-2.3.2-0.ibm47.el7.x86_64 nfs-ganesha-gpfs-2.3.2-0.ibm47.el7.x86_64 From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 09/20/2017 12:07 PM Subject: gpfsug-discuss Digest, Vol 68, Issue 42 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=BpVUgvFT2Qwgw0hveEgQaHFwn2mjeQjeBrkXHX_aC0A&m=2oGcWc1xx6zOclryoU2BdJykABuIR118zXTmSAA8msU&s=7q0JMYVHMSGlUAYquNMlrDRF6BDj6-76Oc4VbXrvlHE&e= or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: export nfs share on gpfs with no authentication (Jonathon A Anderson) ---------------------------------------------------------------------- Message: 1 Date: Wed, 20 Sep 2017 18:55:04 +0000 From: Jonathon A Anderson To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Message-ID: Content-Type: text/plain; charset="us-ascii" I shouldn't need SMB for authentication if I'm only using userdefined authentication, though. ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Sobey, Richard A Sent: Wednesday, September 20, 2017 2:23:37 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication This sounded familiar to a problem I had to do with SMB and NFS. I've looked, and it's a different problem, but at the time I had this response. "That would be the case when Active Directory is configured for authentication. In that case the SMB service includes two aspects: One is the actual SMB file server, and the second one is the service for the Active Directory integration. Since NFS depends on authentication and id mapping services, it requires SMB to be running." I suspect the last paragraph is relevant in your case. HTH Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jonathon A Anderson Sent: 20 September 2017 06:13 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Returning to this thread cause I'm having the same issue as Ilan, above. I'm working on setting up CES in our environment after finally getting a blocking bugfix applied. I'm making it further now, but I'm getting an error when I try to create my export: --- [root at sgate2 ~]# mmnfs export add /gpfs/summit/scratch --client 'login*.rc.int.colorado.edu(rw,root_squash);dtn*.rc.int.colorado.edu(rw,root_squash)' mmcesfuncs.sh: Current authentication: none is invalid. This operation can not be completed without correct Authentication configuration. Configure authentication using: mmuserauth mmnfs export add: Command failed. Examine previous error messages to determine cause. --- When I try to configure mmuserauth, I get an error about not having SMB active; but I don't want to configure SMB, only NFS. --- [root at sgate2 ~]# /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. --- How can I configure NFS exports with mmnfs without having to enable SMB? ~jonathon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=ilYETqcaNr1y1ulWWDPjVg_X9pt35O1eYBTyFwJP56Y&m=VW8gJLSqT4rru6lFZXxCFp-Y3ngi6IUydv5czoG8kTE&s=deIQZQr-qfqLqW377yNysTJI8y7QJOdbokVjlnDr2d8&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.hearns at asml.com Mon Sep 25 08:40:34 2017 From: john.hearns at asml.com (John Hearns) Date: Mon, 25 Sep 2017 07:40:34 +0000 Subject: [gpfsug-discuss] SPectrum Scale on AWS Message-ID: I guess this is not news on this list, however I did see a reference to SpectrumScale on The Register this morning, which linked to this paper: https://s3.amazonaws.com/quickstart-reference/ibm/spectrum/scale/latest/doc/ibm-spectrum-scale-on-the-aws-cloud.pdf The article is here https://www.theregister.co.uk/2017/09/25/storage_super_club_sandwich/ 12 Terabyte Helium drives now available. -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikeowen at thinkboxsoftware.com Mon Sep 25 10:26:21 2017 From: mikeowen at thinkboxsoftware.com (Mike Owen) Date: Mon, 25 Sep 2017 10:26:21 +0100 Subject: [gpfsug-discuss] SPectrum Scale on AWS In-Reply-To: References: Message-ID: Full PR release below: https://aws.amazon.com/about-aws/whats-new/2017/09/deploy-ibm-spectrum-scale-on-the-aws-cloud-with-new-quick-start/ Posted On: Sep 13, 2017 This new Quick Start automatically deploys a highly available IBM Spectrum Scale cluster with replication on the Amazon Web Services (AWS) Cloud, into a configuration of your choice. (A small cluster can be deployed in about 25 minutes.) IBM Spectrum Scale is a flexible, software-defined storage solution that can be deployed as highly available, high-performance file storage. It can scale in several dimensions, including performance (bandwidth and IOPS), capacity, and number of nodes that can mount the file system. The product?s high performance and scalability helps address the needs of applications whose performance (or performance-to-capacity ratio) demands cannot be met by traditional scale-up storage systems. The IBM Spectrum Scale software is being made available through a 90-day trial license evaluation program. This Quick Start automates the deployment of IBM Spectrum Scale on AWS for users who require highly available access to a shared name space across multiple instances with good performance, without requiring an in-depth knowledge of IBM Spectrum Scale. The Quick Start deploys IBM Network Shared Disk (NSD) storage server instances and IBM Spectrum Scale compute instances into a virtual private cloud (VPC) in your AWS account. Data and metadata elements are replicated across two Availability Zones for optimal data protection. You can build a new VPC for IBM Spectrum Scale, or deploy the software into your existing VPC. The automated deployment provisions the IBM Spectrum Scale instances in Auto Scaling groups for instance scaling and management. The deployment and configuration tasks are automated by AWS CloudFormation templates that you can customize during launch. You can also use the templates as a starting point for your own implementation, by downloading them from the GitHub repository . The Quick Start includes a guide with step-by-step deployment and configuration instructions. To get started with IBM Spectrum Scale on AWS, use the following resources: - View the architecture and details - View the deployment guide - Browse and launch other AWS Quick Start reference deployments On 25 September 2017 at 08:40, John Hearns wrote: > I guess this is not news on this list, however I did see a reference to > SpectrumScale on The Register this morning, > > which linked to this paper: > > https://s3.amazonaws.com/quickstart-reference/ibm/ > spectrum/scale/latest/doc/ibm-spectrum-scale-on-the-aws-cloud.pdf > > > > The article is here https://www.theregister.co.uk/ > 2017/09/25/storage_super_club_sandwich/ > > 12 Terabyte Helium drives now available. > > > > > -- The information contained in this communication and any attachments is > confidential and may be privileged, and is for the sole use of the intended > recipient(s). Any unauthorized review, use, disclosure or distribution is > prohibited. Unless explicitly stated otherwise in the body of this > communication or the attachment thereto (if any), the information is > provided on an AS-IS basis without any express or implied warranties or > liabilities. To the extent you are relying on this information, you are > doing so at your own risk. If you are not the intended recipient, please > notify the sender immediately by replying to this message and destroy all > copies of this message and any attachments. Neither the sender nor the > company/group of companies he or she represents shall be liable for the > proper and complete transmission of the information contained in this > communication, or for any delay in its receipt. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Mon Sep 25 12:42:15 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 25 Sep 2017 11:42:15 +0000 Subject: [gpfsug-discuss] Experience with zimon database stability, and best practices for backup? Message-ID: <018DE6B7-ADE3-4A01-B23C-9DB668FD95DB@nuance.com> Another data point for Keith/Kristy, I?ve been using Zimon for about 18 months now, and I?ll have to admit it?s been less than robust for long-term data. The biggest issue I?ve run into is the stability of the collector process. I have it crash on a fairly regular basis, most due to memory usage. This results in data loss You can configure it in a highly-available mode that should mitigate this to some degree. However, I don?t think IBM has published any details on how reliable the data collection process is. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Kristy Kallback-Rose Reply-To: gpfsug main discussion list Date: Sunday, September 24, 2017 at 2:29 PM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Experience with zimon database stability, and best practices for backup? Hi Keith, We have barely begun with Zimon and have not (knock, knock) run up against any loss or corruption issues with Zimon. However, getting data out of Zimon for various reasons is something I have been thinking about. I'm interested partly because of the granularity that is lost over time like with any round robin style data collection scheme. So I guess one question is whether you have considered pulling the data out to another database, looked at the SS GUI which uses a postgres db (iirc, about to take off on a flight and can't check), or looked at the Grafana bridge which would get data into OpenTsdb format, again iirc. Anyway, just some things for consideration and a request to share back whatever you find out if it's off list. Thanks, getting stink eye to go to airplane mode. More later. Cheers Kristy On Sep 24, 2017 11:05 AM, "Keith Ball" > wrote: Hello All, In a recent Spectrum Scale performance study, we used zimon/mmperfmon to gather metrics. During a period of 2 months, we ended up losing data twice from the zimon database; once after the virtual disk serving both the OS files and zimon collector and DB storage was resized, and a second time after an unknown event (the loss was discovered when plotting in Grafana only went back to a certain data and time; likewise, mmperfmon query output only went back to the same time). Details: - Spectrum Scale 4.2.1.1 (on NSD servers); 4.2.1.2 on the zimon collector node and other clients - Data retention in the "raw" stratum was set to 2 months; the "domains" settings were as follows (note that we did not hit the ceiling of 60GB (1GB/file * 60 files): domains = { # this is the raw domain aggregation = 0 # aggregation factor for the raw domain is always 0. ram = "12g" # amount of RAM to be used duration = "2m" # amount of time that data with the highest precision is kept. filesize = "1g" # maximum file size files = 60 # number of files. }, { # this is the first aggregation domain that aggregates to 10 seconds aggregation = 10 ram = "800m" # amount of RAM to be used duration = "6m" # keep aggregates for 1 week. filesize = "1g" # maximum file size files = 10 # number of files. }, { # this is the second aggregation domain that aggregates to 30*10 seconds == 5 minutes aggregation = 30 ram = "800m" # amount of RAM to be used duration = "1y" # keep averages for 2 months. filesize = "1g" # maximum file size files = 5 # number of files. }, { # this is the third aggregation domain that aggregates to 24*30*10 seconds == 2 hours aggregation = 24 ram = "800m" # amount of RAM to be used duration = "2y" # filesize = "1g" # maximum file size files = 5 # number of files. } Questions: 1.) Has anyone had similar issues with losing data from zimon? 2.) Are there known circumstances where data could be lost, e.g. changing the aggregation domain definitions, or even simply restarting the zimon collector? 3.) Does anyone have any "best practices" for backing up the zimon database? We were taking weekly "snapshots" by shutting down the collector, and making a tarball copy of the /opt/ibm/zimon directory (but the database corruption/data loss still crept through for various reasons). In terms of debugging, we do not have Scale or zimon logs going back to the suspected dates of data loss; we do have a gpfs.snap from about a month after the last data loss - would it have any useful clues? Opening a PMR could be tricky, as it was the customer who has the support entitlement, and the environment (specifically the old cluster definitino and the zimon collector VM) was torn down. Many Thanks, Keith -- Keith D. Ball, PhD RedLine Performance Solutions, LLC web: http://www.redlineperf.com/ email: kball at redlineperf.com cell: 540-557-7851 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Mon Sep 25 15:35:33 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Mon, 25 Sep 2017 14:35:33 +0000 Subject: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? Message-ID: <1506350132.352.17.camel@imperial.ac.uk> Hi all, This used to work (removing a SID from an ACL), but doesn't any more. Looks like a bug unless I'm being stupid. [root at cesnode~]# mmsmb exportacl list neuroscience2 --viewsids [neuroscience2] REVISION:1 ACL:S-1-1-0:ALLOWED/FULL [root at cesnode ~] mmsmb exportacl remove neuroscience2 --SID S-1-1-0 mmsmb exportacl remove: Incorrect option: --sid Usage: mmsmb exportacl remove ExportName {Name | --user UserName | --group GroupName | --system SystemName | --SID SID} [--access Access] [--permissions Permissions] [--viewsddl] [--viewsids] [-h|--help] where: Access is one of ALLOWED, DENIED Permissions is one of FULL, CHANGE, READ or any combination of RWXDPO I've tried lower case SID i.e --sid, and specifying --access ALLOWED and --permissions FULL. Omitting the --SID argument entirely simply results in GPFS telling me I must specify a Name or an SID. [root at cesnode~]# mmsmb exportacl remove neuroscience2 --access ALLOWED --permissions FULL [E] The mmsmb exportacl remove command requires a Name or SID. Can anyone see my mistake? Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From christof.schmitt at us.ibm.com Mon Sep 25 22:41:11 2017 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Mon, 25 Sep 2017 21:41:11 +0000 Subject: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? In-Reply-To: <1506350132.352.17.camel@imperial.ac.uk> References: <1506350132.352.17.camel@imperial.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: From christof.schmitt at us.ibm.com Mon Sep 25 22:41:11 2017 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Mon, 25 Sep 2017 21:41:11 +0000 Subject: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? In-Reply-To: <1506350132.352.17.camel@imperial.ac.uk> References: <1506350132.352.17.camel@imperial.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Tue Sep 26 09:22:05 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Tue, 26 Sep 2017 08:22:05 +0000 Subject: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? In-Reply-To: References: <1506350132.352.17.camel@imperial.ac.uk> Message-ID: Hi Christof, thanks I?ll try it on a test cluster. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christof Schmitt Sent: 25 September 2017 22:41 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? 4.2.3 PTF4 seems to have a fix for this area. Can you try again with that PTF installed? Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) ----- Original message ----- From: "Sobey, Richard A" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at gpfsug.org" > Cc: Subject: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? Date: Mon, Sep 25, 2017 7:35 AM Hi all, This used to work (removing a SID from an ACL), but doesn't any more. Looks like a bug unless I'm being stupid. [root at cesnode~]# mmsmb exportacl list neuroscience2 --viewsids [neuroscience2] REVISION:1 ACL:S-1-1-0:ALLOWED/FULL [root at cesnode ~] mmsmb exportacl remove neuroscience2 --SID S-1-1-0 mmsmb exportacl remove: Incorrect option: --sid Usage: mmsmb exportacl remove ExportName {Name | --user UserName | --group GroupName | --system SystemName | --SID SID} [--access Access] [--permissions Permissions] [--viewsddl] [--viewsids] [-h|--help] where: Access is one of ALLOWED, DENIED Permissions is one of FULL, CHANGE, READ or any combination of RWXDPO I've tried lower case SID i.e --sid, and specifying --access ALLOWED and --permissions FULL. Omitting the --SID argument entirely simply results in GPFS telling me I must specify a Name or an SID. [root at cesnode~]# mmsmb exportacl remove neuroscience2 --access ALLOWED --permissions FULL [E] The mmsmb exportacl remove command requires a Name or SID. Can anyone see my mistake? Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=5Nn7eUPeYe291x8f39jKybESLKv_W_XtkTkS8fTR-NI&m=c3uTUNFbPTWWcTNUNVVejlQ0xdnhBAfQdouTBlVgnjc&s=hsWeRhH-BhTEaAlrTPbJGlwCV-5Ui7t03Zcec9kywOA&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Tue Sep 26 09:22:05 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Tue, 26 Sep 2017 08:22:05 +0000 Subject: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? In-Reply-To: References: <1506350132.352.17.camel@imperial.ac.uk> Message-ID: Hi Christof, thanks I?ll try it on a test cluster. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christof Schmitt Sent: 25 September 2017 22:41 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? 4.2.3 PTF4 seems to have a fix for this area. Can you try again with that PTF installed? Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) ----- Original message ----- From: "Sobey, Richard A" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at gpfsug.org" > Cc: Subject: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? Date: Mon, Sep 25, 2017 7:35 AM Hi all, This used to work (removing a SID from an ACL), but doesn't any more. Looks like a bug unless I'm being stupid. [root at cesnode~]# mmsmb exportacl list neuroscience2 --viewsids [neuroscience2] REVISION:1 ACL:S-1-1-0:ALLOWED/FULL [root at cesnode ~] mmsmb exportacl remove neuroscience2 --SID S-1-1-0 mmsmb exportacl remove: Incorrect option: --sid Usage: mmsmb exportacl remove ExportName {Name | --user UserName | --group GroupName | --system SystemName | --SID SID} [--access Access] [--permissions Permissions] [--viewsddl] [--viewsids] [-h|--help] where: Access is one of ALLOWED, DENIED Permissions is one of FULL, CHANGE, READ or any combination of RWXDPO I've tried lower case SID i.e --sid, and specifying --access ALLOWED and --permissions FULL. Omitting the --SID argument entirely simply results in GPFS telling me I must specify a Name or an SID. [root at cesnode~]# mmsmb exportacl remove neuroscience2 --access ALLOWED --permissions FULL [E] The mmsmb exportacl remove command requires a Name or SID. Can anyone see my mistake? Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=5Nn7eUPeYe291x8f39jKybESLKv_W_XtkTkS8fTR-NI&m=c3uTUNFbPTWWcTNUNVVejlQ0xdnhBAfQdouTBlVgnjc&s=hsWeRhH-BhTEaAlrTPbJGlwCV-5Ui7t03Zcec9kywOA&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Tue Sep 26 10:59:13 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Tue, 26 Sep 2017 09:59:13 +0000 Subject: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? In-Reply-To: References: <1506350132.352.17.camel@imperial.ac.uk> Message-ID: There isn?t a default ACL being applied to the export at all now, which is fine, but it differs from the behaviour in 4.2.3 PTF2. Thanks Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 26 September 2017 09:22 To: gpfsug main discussion list Cc: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? Hi Christof, thanks I?ll try it on a test cluster. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christof Schmitt Sent: 25 September 2017 22:41 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? 4.2.3 PTF4 seems to have a fix for this area. Can you try again with that PTF installed? Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) ----- Original message ----- From: "Sobey, Richard A" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at gpfsug.org" > Cc: Subject: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? Date: Mon, Sep 25, 2017 7:35 AM Hi all, This used to work (removing a SID from an ACL), but doesn't any more. Looks like a bug unless I'm being stupid. [root at cesnode~]# mmsmb exportacl list neuroscience2 --viewsids [neuroscience2] REVISION:1 ACL:S-1-1-0:ALLOWED/FULL [root at cesnode ~] mmsmb exportacl remove neuroscience2 --SID S-1-1-0 mmsmb exportacl remove: Incorrect option: --sid Usage: mmsmb exportacl remove ExportName {Name | --user UserName | --group GroupName | --system SystemName | --SID SID} [--access Access] [--permissions Permissions] [--viewsddl] [--viewsids] [-h|--help] where: Access is one of ALLOWED, DENIED Permissions is one of FULL, CHANGE, READ or any combination of RWXDPO I've tried lower case SID i.e --sid, and specifying --access ALLOWED and --permissions FULL. Omitting the --SID argument entirely simply results in GPFS telling me I must specify a Name or an SID. [root at cesnode~]# mmsmb exportacl remove neuroscience2 --access ALLOWED --permissions FULL [E] The mmsmb exportacl remove command requires a Name or SID. Can anyone see my mistake? Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=5Nn7eUPeYe291x8f39jKybESLKv_W_XtkTkS8fTR-NI&m=c3uTUNFbPTWWcTNUNVVejlQ0xdnhBAfQdouTBlVgnjc&s=hsWeRhH-BhTEaAlrTPbJGlwCV-5Ui7t03Zcec9kywOA&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Tue Sep 26 10:59:13 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Tue, 26 Sep 2017 09:59:13 +0000 Subject: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? In-Reply-To: References: <1506350132.352.17.camel@imperial.ac.uk> Message-ID: There isn?t a default ACL being applied to the export at all now, which is fine, but it differs from the behaviour in 4.2.3 PTF2. Thanks Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 26 September 2017 09:22 To: gpfsug main discussion list Cc: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? Hi Christof, thanks I?ll try it on a test cluster. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christof Schmitt Sent: 25 September 2017 22:41 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? 4.2.3 PTF4 seems to have a fix for this area. Can you try again with that PTF installed? Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) ----- Original message ----- From: "Sobey, Richard A" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at gpfsug.org" > Cc: Subject: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? Date: Mon, Sep 25, 2017 7:35 AM Hi all, This used to work (removing a SID from an ACL), but doesn't any more. Looks like a bug unless I'm being stupid. [root at cesnode~]# mmsmb exportacl list neuroscience2 --viewsids [neuroscience2] REVISION:1 ACL:S-1-1-0:ALLOWED/FULL [root at cesnode ~] mmsmb exportacl remove neuroscience2 --SID S-1-1-0 mmsmb exportacl remove: Incorrect option: --sid Usage: mmsmb exportacl remove ExportName {Name | --user UserName | --group GroupName | --system SystemName | --SID SID} [--access Access] [--permissions Permissions] [--viewsddl] [--viewsids] [-h|--help] where: Access is one of ALLOWED, DENIED Permissions is one of FULL, CHANGE, READ or any combination of RWXDPO I've tried lower case SID i.e --sid, and specifying --access ALLOWED and --permissions FULL. Omitting the --SID argument entirely simply results in GPFS telling me I must specify a Name or an SID. [root at cesnode~]# mmsmb exportacl remove neuroscience2 --access ALLOWED --permissions FULL [E] The mmsmb exportacl remove command requires a Name or SID. Can anyone see my mistake? Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=5Nn7eUPeYe291x8f39jKybESLKv_W_XtkTkS8fTR-NI&m=c3uTUNFbPTWWcTNUNVVejlQ0xdnhBAfQdouTBlVgnjc&s=hsWeRhH-BhTEaAlrTPbJGlwCV-5Ui7t03Zcec9kywOA&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From christof.schmitt at us.ibm.com Tue Sep 26 21:49:09 2017 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Tue, 26 Sep 2017 20:49:09 +0000 Subject: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? In-Reply-To: References: , <1506350132.352.17.camel@imperial.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Sep 27 09:02:51 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 27 Sep 2017 08:02:51 +0000 Subject: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? In-Reply-To: References: , <1506350132.352.17.camel@imperial.ac.uk> Message-ID: I?m sorry, you?re right. I can only assume my brain was looking for an SID entry so when I saw Everyone:ALLOWED/FULL it didn?t process it at all. 4.2.3-4: [root at cesnode ~]# mmsmb exportacl list [testces] ACL:\Everyone:ALLOWED/FULL From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christof Schmitt Sent: 26 September 2017 21:49 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? The default for the "export ACL" is always to allow access to "Everyone", so that the the "export ACL" does not limit access by default, but only the file system ACL. I do not have systems with these code levels at hand, could you show the difference you see between PTF2 and PTF4? Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) ----- Original message ----- From: "Sobey, Richard A" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: "gpfsug-discuss at gpfsug.org" > Subject: Re: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? Date: Tue, Sep 26, 2017 2:59 AM There isn?t a default ACL being applied to the export at all now, which is fine, but it differs from the behaviour in 4.2.3 PTF2. Thanks Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 26 September 2017 09:22 To: gpfsug main discussion list > Cc: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? Hi Christof, thanks I?ll try it on a test cluster. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christof Schmitt Sent: 25 September 2017 22:41 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? 4.2.3 PTF4 seems to have a fix for this area. Can you try again with that PTF installed? Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) ----- Original message ----- From: "Sobey, Richard A" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at gpfsug.org" > Cc: Subject: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? Date: Mon, Sep 25, 2017 7:35 AM Hi all, This used to work (removing a SID from an ACL), but doesn't any more. Looks like a bug unless I'm being stupid. [root at cesnode~]# mmsmb exportacl list neuroscience2 --viewsids [neuroscience2] REVISION:1 ACL:S-1-1-0:ALLOWED/FULL [root at cesnode ~] mmsmb exportacl remove neuroscience2 --SID S-1-1-0 mmsmb exportacl remove: Incorrect option: --sid Usage: mmsmb exportacl remove ExportName {Name | --user UserName | --group GroupName | --system SystemName | --SID SID} [--access Access] [--permissions Permissions] [--viewsddl] [--viewsids] [-h|--help] where: Access is one of ALLOWED, DENIED Permissions is one of FULL, CHANGE, READ or any combination of RWXDPO I've tried lower case SID i.e --sid, and specifying --access ALLOWED and --permissions FULL. Omitting the --SID argument entirely simply results in GPFS telling me I must specify a Name or an SID. [root at cesnode~]# mmsmb exportacl remove neuroscience2 --access ALLOWED --permissions FULL [E] The mmsmb exportacl remove command requires a Name or SID. Can anyone see my mistake? Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=5Nn7eUPeYe291x8f39jKybESLKv_W_XtkTkS8fTR-NI&m=c3uTUNFbPTWWcTNUNVVejlQ0xdnhBAfQdouTBlVgnjc&s=hsWeRhH-BhTEaAlrTPbJGlwCV-5Ui7t03Zcec9kywOA&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=5Nn7eUPeYe291x8f39jKybESLKv_W_XtkTkS8fTR-NI&m=B-AqKIRCmLBzoWAhGn7NY-ZASOX25NuP_c_ndE8gy4A&s=S06OD3mbRedYjfwETO8tUnlOjnWT7pOX8nsYX5ebIdA&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From kenneth.waegeman at ugent.be Wed Sep 27 09:16:49 2017 From: kenneth.waegeman at ugent.be (Kenneth Waegeman) Date: Wed, 27 Sep 2017 10:16:49 +0200 Subject: [gpfsug-discuss] el7.4 compatibility Message-ID: Hi, Is there already some information available of gpfs (and protocols) on el7.4 ? Thanks! Kenneth From michael.holliday at crick.ac.uk Wed Sep 27 09:25:58 2017 From: michael.holliday at crick.ac.uk (Michael Holliday) Date: Wed, 27 Sep 2017 08:25:58 +0000 Subject: [gpfsug-discuss] File Quotas vs Inode Limits Message-ID: Hi All, I'm in process of setting up quota for our users. We currently have block quotas per file set, and an inode limit for each inode space. Our users have request more transparency relating to the inode limit as as it is they can't see any information. Are there any disadvantages to implementing file quotas, and increasing the inode limits so that they will not be reached? Michael Michael Holliday HPC Systems Engineer Tel: 0203 796 3167 The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Wed Sep 27 14:59:08 2017 From: bbanister at jumptrading.com (Bryan Banister) Date: Wed, 27 Sep 2017 13:59:08 +0000 Subject: [gpfsug-discuss] File Quotas vs Inode Limits In-Reply-To: References: Message-ID: Actually you will get a benefit in that you can set up a callback so that users get alerted when they got over a soft quota. We also set up a fileset quota so that the callback will automatically notify users when they exceed their block and file quotas for their fileset as well. Hope that helps, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Michael Holliday Sent: Wednesday, September 27, 2017 4:26 AM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] File Quotas vs Inode Limits Note: External Email ________________________________ Hi All, I'm in process of setting up quota for our users. We currently have block quotas per file set, and an inode limit for each inode space. Our users have request more transparency relating to the inode limit as as it is they can't see any information. Are there any disadvantages to implementing file quotas, and increasing the inode limits so that they will not be reached? Michael Michael Holliday HPC Systems Engineer Tel: 0203 796 3167 The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Greg.Lehmann at csiro.au Thu Sep 28 00:44:53 2017 From: Greg.Lehmann at csiro.au (Greg.Lehmann at csiro.au) Date: Wed, 27 Sep 2017 23:44:53 +0000 Subject: [gpfsug-discuss] el7.4 compatibility In-Reply-To: References: Message-ID: <0285ed0191fa43ac9b1f1b3e36a1f015@exch1-cdc.nexus.csiro.au> I guess I may as well ask about SLES 12 SP3 as well! TIA. -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Kenneth Waegeman Sent: Wednesday, 27 September 2017 6:17 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] el7.4 compatibility Hi, Is there already some information available of gpfs (and protocols) on el7.4 ? Thanks! Kenneth _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From bbanister at jumptrading.com Thu Sep 28 14:21:34 2017 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 28 Sep 2017 13:21:34 +0000 Subject: [gpfsug-discuss] el7.4 compatibility In-Reply-To: <0285ed0191fa43ac9b1f1b3e36a1f015@exch1-cdc.nexus.csiro.au> References: <0285ed0191fa43ac9b1f1b3e36a1f015@exch1-cdc.nexus.csiro.au> Message-ID: Please review this site: https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html Hope that helps, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Greg.Lehmann at csiro.au Sent: Wednesday, September 27, 2017 6:45 PM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] el7.4 compatibility Note: External Email ------------------------------------------------- I guess I may as well ask about SLES 12 SP3 as well! TIA. -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Kenneth Waegeman Sent: Wednesday, 27 September 2017 6:17 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] el7.4 compatibility Hi, Is there already some information available of gpfs (and protocols) on el7.4 ? Thanks! Kenneth _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. From JRLang at uwyo.edu Thu Sep 28 15:18:52 2017 From: JRLang at uwyo.edu (Jeffrey R. Lang) Date: Thu, 28 Sep 2017 14:18:52 +0000 Subject: [gpfsug-discuss] el7.4 compatibility In-Reply-To: References: <0285ed0191fa43ac9b1f1b3e36a1f015@exch1-cdc.nexus.csiro.au> Message-ID: I just tired to build the GPFS GPL module against the latest version of RHEL 7.4 kernel and the build fails. The link below show that it should work. cc kdump.o kdump-kern.o kdump-kern-dwarfs.o -o kdump -lpthread kdump-kern.o: In function `GetOffset': kdump-kern.c:(.text+0x9): undefined reference to `page_offset_base' kdump-kern.o: In function `KernInit': kdump-kern.c:(.text+0x58): undefined reference to `page_offset_base' collect2: error: ld returned 1 exit status make[1]: *** [modules] Error 1 make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux' make: *** [Modules] Error 1 -------------------------------------------------------- mmbuildgpl: Building GPL module failed at Thu Sep 28 08:12:14 MDT 2017. -------------------------------------------------------- mmbuildgpl: Command failed. Examine previous error messages to determine cause. [root at bkupsvr3 ~]# [root at bkupsvr3 ~]# [root at bkupsvr3 ~]# [root at bkupsvr3 ~]# [root at bkupsvr3 ~]# uname -a Linux bkupsvr3.arcc.uwyo.edu 3.10.0-693.2.2.el7.x86_64 #1 SMP Sat Sep 9 03:55:24 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux [root at bkupsvr3 ~]# mmdiag --version === mmdiag: version === Current GPFS build: "4.2.2.3 ". Built on Mar 16 2017 at 11:19:59 In order to use GPFS with RHEL 7.4 I have to use a 7.3 kernel. In my case 514.26.2 If I'm missing something can some one point me in the right direction? -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Bryan Banister Sent: Thursday, September 28, 2017 8:22 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] el7.4 compatibility Please review this site: https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html Hope that helps, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Greg.Lehmann at csiro.au Sent: Wednesday, September 27, 2017 6:45 PM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] el7.4 compatibility Note: External Email ------------------------------------------------- I guess I may as well ask about SLES 12 SP3 as well! TIA. -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Kenneth Waegeman Sent: Wednesday, 27 September 2017 6:17 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] el7.4 compatibility Hi, Is there already some information available of gpfs (and protocols) on el7.4 ? Thanks! Kenneth _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From xhejtman at ics.muni.cz Thu Sep 28 15:22:54 2017 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Thu, 28 Sep 2017 16:22:54 +0200 Subject: [gpfsug-discuss] el7.4 compatibility In-Reply-To: References: <0285ed0191fa43ac9b1f1b3e36a1f015@exch1-cdc.nexus.csiro.au> Message-ID: <20170928142254.xwjvp3qwnilazer7@ics.muni.cz> You need 4.2.3.4 GPFS version and it will work. On Thu, Sep 28, 2017 at 02:18:52PM +0000, Jeffrey R. Lang wrote: > I just tired to build the GPFS GPL module against the latest version of RHEL 7.4 kernel and the build fails. The link below show that it should work. > > cc kdump.o kdump-kern.o kdump-kern-dwarfs.o -o kdump -lpthread > kdump-kern.o: In function `GetOffset': > kdump-kern.c:(.text+0x9): undefined reference to `page_offset_base' > kdump-kern.o: In function `KernInit': > kdump-kern.c:(.text+0x58): undefined reference to `page_offset_base' > collect2: error: ld returned 1 exit status > make[1]: *** [modules] Error 1 > make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux' > make: *** [Modules] Error 1 > -------------------------------------------------------- > mmbuildgpl: Building GPL module failed at Thu Sep 28 08:12:14 MDT 2017. > -------------------------------------------------------- > mmbuildgpl: Command failed. Examine previous error messages to determine cause. > [root at bkupsvr3 ~]# > [root at bkupsvr3 ~]# > [root at bkupsvr3 ~]# > [root at bkupsvr3 ~]# > [root at bkupsvr3 ~]# uname -a > Linux bkupsvr3.arcc.uwyo.edu 3.10.0-693.2.2.el7.x86_64 #1 SMP Sat Sep 9 03:55:24 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux > [root at bkupsvr3 ~]# mmdiag --version > > === mmdiag: version === > Current GPFS build: "4.2.2.3 ". > Built on Mar 16 2017 at 11:19:59 > > In order to use GPFS with RHEL 7.4 I have to use a 7.3 kernel. In my case 514.26.2 > > If I'm missing something can some one point me in the right direction? > > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Bryan Banister > Sent: Thursday, September 28, 2017 8:22 AM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] el7.4 compatibility > > Please review this site: > > https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html > > Hope that helps, > -Bryan > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Greg.Lehmann at csiro.au > Sent: Wednesday, September 27, 2017 6:45 PM > To: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] el7.4 compatibility > > Note: External Email > ------------------------------------------------- > > I guess I may as well ask about SLES 12 SP3 as well! TIA. > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Kenneth Waegeman > Sent: Wednesday, 27 September 2017 6:17 PM > To: gpfsug-discuss at spectrumscale.org > Subject: [gpfsug-discuss] el7.4 compatibility > > Hi, > > Is there already some information available of gpfs (and protocols) on > el7.4 ? > > Thanks! > > Kenneth > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > ________________________________ > > Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Luk?? Hejtm?nek From S.J.Thompson at bham.ac.uk Thu Sep 28 15:23:53 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Thu, 28 Sep 2017 14:23:53 +0000 Subject: [gpfsug-discuss] el7.4 compatibility In-Reply-To: References: <0285ed0191fa43ac9b1f1b3e36a1f015@exch1-cdc.nexus.csiro.au> Message-ID: The 7.4 kernels are listed as having been tested by IBM. Having said that, we have clients running 7.4 kernel and its OK, but we are 4.2.3.4efix2, so bump versions... Simon On 28/09/2017, 15:18, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jeffrey R. Lang" wrote: >I just tired to build the GPFS GPL module against the latest version of >RHEL 7.4 kernel and the build fails. The link below show that it should >work. > >cc kdump.o kdump-kern.o kdump-kern-dwarfs.o -o kdump -lpthread >kdump-kern.o: In function `GetOffset': >kdump-kern.c:(.text+0x9): undefined reference to `page_offset_base' >kdump-kern.o: In function `KernInit': >kdump-kern.c:(.text+0x58): undefined reference to `page_offset_base' >collect2: error: ld returned 1 exit status >make[1]: *** [modules] Error 1 >make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux' >make: *** [Modules] Error 1 >-------------------------------------------------------- >mmbuildgpl: Building GPL module failed at Thu Sep 28 08:12:14 MDT 2017. >-------------------------------------------------------- >mmbuildgpl: Command failed. Examine previous error messages to determine >cause. >[root at bkupsvr3 ~]# >[root at bkupsvr3 ~]# >[root at bkupsvr3 ~]# >[root at bkupsvr3 ~]# >[root at bkupsvr3 ~]# uname -a >Linux bkupsvr3.arcc.uwyo.edu 3.10.0-693.2.2.el7.x86_64 #1 SMP Sat Sep 9 >03:55:24 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux >[root at bkupsvr3 ~]# mmdiag --version > >=== mmdiag: version === >Current GPFS build: "4.2.2.3 ". >Built on Mar 16 2017 at 11:19:59 > >In order to use GPFS with RHEL 7.4 I have to use a 7.3 kernel. In my >case 514.26.2 > >If I'm missing something can some one point me in the right direction? > > >-----Original Message----- >From: gpfsug-discuss-bounces at spectrumscale.org >[mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Bryan >Banister >Sent: Thursday, September 28, 2017 8:22 AM >To: gpfsug main discussion list >Subject: Re: [gpfsug-discuss] el7.4 compatibility > >Please review this site: > >https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html > >Hope that helps, >-Bryan > >-----Original Message----- >From: gpfsug-discuss-bounces at spectrumscale.org >[mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of >Greg.Lehmann at csiro.au >Sent: Wednesday, September 27, 2017 6:45 PM >To: gpfsug-discuss at spectrumscale.org >Subject: Re: [gpfsug-discuss] el7.4 compatibility > >Note: External Email >------------------------------------------------- > >I guess I may as well ask about SLES 12 SP3 as well! TIA. > >-----Original Message----- >From: gpfsug-discuss-bounces at spectrumscale.org >[mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Kenneth >Waegeman >Sent: Wednesday, 27 September 2017 6:17 PM >To: gpfsug-discuss at spectrumscale.org >Subject: [gpfsug-discuss] el7.4 compatibility > >Hi, > >Is there already some information available of gpfs (and protocols) on >el7.4 ? > >Thanks! > >Kenneth > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > >________________________________ > >Note: This email is for the confidential use of the named addressee(s) >only and may contain proprietary, confidential or privileged information. >If you are not the intended recipient, you are hereby notified that any >review, dissemination or copying of this email is strictly prohibited, >and to please notify the sender immediately and destroy this email and >any attachments. Email transmission cannot be guaranteed to be secure or >error-free. The Company, therefore, does not make any guarantees as to >the completeness or accuracy of this email or any attachments. This email >is for informational purposes only and does not constitute a >recommendation, offer, request or solicitation of any kind to buy, sell, >subscribe, redeem or perform any type of transaction of a financial >product. >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss From kenneth.waegeman at ugent.be Thu Sep 28 15:36:04 2017 From: kenneth.waegeman at ugent.be (Kenneth Waegeman) Date: Thu, 28 Sep 2017 16:36:04 +0200 Subject: [gpfsug-discuss] el7.4 compatibility In-Reply-To: References: <0285ed0191fa43ac9b1f1b3e36a1f015@exch1-cdc.nexus.csiro.au> Message-ID: <087bdf18-5763-0154-8515-7b9d04e5b302@ugent.be> On 28/09/17 16:23, Simon Thompson (IT Research Support) wrote: > The 7.4 kernels are listed as having been tested by IBM. Hi, Were did you find this? > > Having said that, we have clients running 7.4 kernel and its OK, but we > are 4.2.3.4efix2, so bump versions... Do you have some information about the efix2? Is this for 7.4 ? And where should we find this :-) Thank you! Kenneth > > Simon > > On 28/09/2017, 15:18, "gpfsug-discuss-bounces at spectrumscale.org on behalf > of Jeffrey R. Lang" JRLang at uwyo.edu> wrote: > >> I just tired to build the GPFS GPL module against the latest version of >> RHEL 7.4 kernel and the build fails. The link below show that it should >> work. >> >> cc kdump.o kdump-kern.o kdump-kern-dwarfs.o -o kdump -lpthread >> kdump-kern.o: In function `GetOffset': >> kdump-kern.c:(.text+0x9): undefined reference to `page_offset_base' >> kdump-kern.o: In function `KernInit': >> kdump-kern.c:(.text+0x58): undefined reference to `page_offset_base' >> collect2: error: ld returned 1 exit status >> make[1]: *** [modules] Error 1 >> make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux' >> make: *** [Modules] Error 1 >> -------------------------------------------------------- >> mmbuildgpl: Building GPL module failed at Thu Sep 28 08:12:14 MDT 2017. >> -------------------------------------------------------- >> mmbuildgpl: Command failed. Examine previous error messages to determine >> cause. >> [root at bkupsvr3 ~]# >> [root at bkupsvr3 ~]# >> [root at bkupsvr3 ~]# >> [root at bkupsvr3 ~]# >> [root at bkupsvr3 ~]# uname -a >> Linux bkupsvr3.arcc.uwyo.edu 3.10.0-693.2.2.el7.x86_64 #1 SMP Sat Sep 9 >> 03:55:24 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux >> [root at bkupsvr3 ~]# mmdiag --version >> >> === mmdiag: version === >> Current GPFS build: "4.2.2.3 ". >> Built on Mar 16 2017 at 11:19:59 >> >> In order to use GPFS with RHEL 7.4 I have to use a 7.3 kernel. In my >> case 514.26.2 >> >> If I'm missing something can some one point me in the right direction? >> >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at spectrumscale.org >> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Bryan >> Banister >> Sent: Thursday, September 28, 2017 8:22 AM >> To: gpfsug main discussion list >> Subject: Re: [gpfsug-discuss] el7.4 compatibility >> >> Please review this site: >> >> https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html >> >> Hope that helps, >> -Bryan >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at spectrumscale.org >> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of >> Greg.Lehmann at csiro.au >> Sent: Wednesday, September 27, 2017 6:45 PM >> To: gpfsug-discuss at spectrumscale.org >> Subject: Re: [gpfsug-discuss] el7.4 compatibility >> >> Note: External Email >> ------------------------------------------------- >> >> I guess I may as well ask about SLES 12 SP3 as well! TIA. >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at spectrumscale.org >> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Kenneth >> Waegeman >> Sent: Wednesday, 27 September 2017 6:17 PM >> To: gpfsug-discuss at spectrumscale.org >> Subject: [gpfsug-discuss] el7.4 compatibility >> >> Hi, >> >> Is there already some information available of gpfs (and protocols) on >> el7.4 ? >> >> Thanks! >> >> Kenneth >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> ________________________________ >> >> Note: This email is for the confidential use of the named addressee(s) >> only and may contain proprietary, confidential or privileged information. >> If you are not the intended recipient, you are hereby notified that any >> review, dissemination or copying of this email is strictly prohibited, >> and to please notify the sender immediately and destroy this email and >> any attachments. Email transmission cannot be guaranteed to be secure or >> error-free. The Company, therefore, does not make any guarantees as to >> the completeness or accuracy of this email or any attachments. This email >> is for informational purposes only and does not constitute a >> recommendation, offer, request or solicitation of any kind to buy, sell, >> subscribe, redeem or perform any type of transaction of a financial >> product. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From S.J.Thompson at bham.ac.uk Thu Sep 28 15:45:25 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Thu, 28 Sep 2017 14:45:25 +0000 Subject: [gpfsug-discuss] el7.4 compatibility In-Reply-To: <087bdf18-5763-0154-8515-7b9d04e5b302@ugent.be> References: <0285ed0191fa43ac9b1f1b3e36a1f015@exch1-cdc.nexus.csiro.au> <087bdf18-5763-0154-8515-7b9d04e5b302@ugent.be> Message-ID: Aren't listed as tested Sorry ... 4.2.3.4 we have used with 7.4 as well, efix2 includes a fix for an AFM issue we have. Simon On 28/09/2017, 15:36, "kenneth.waegeman at ugent.be" wrote: > > >On 28/09/17 16:23, Simon Thompson (IT Research Support) wrote: >> The 7.4 kernels are listed as having been tested by IBM. >Hi, > >Were did you find this? >> >> Having said that, we have clients running 7.4 kernel and its OK, but we >> are 4.2.3.4efix2, so bump versions... >Do you have some information about the efix2? Is this for 7.4 ? And >where should we find this :-) > >Thank you! > >Kenneth > >> >> Simon >> >> On 28/09/2017, 15:18, "gpfsug-discuss-bounces at spectrumscale.org on >>behalf >> of Jeffrey R. Lang" >of >> JRLang at uwyo.edu> wrote: >> >>> I just tired to build the GPFS GPL module against the latest version of >>> RHEL 7.4 kernel and the build fails. The link below show that it >>>should >>> work. >>> >>> cc kdump.o kdump-kern.o kdump-kern-dwarfs.o -o kdump -lpthread >>> kdump-kern.o: In function `GetOffset': >>> kdump-kern.c:(.text+0x9): undefined reference to `page_offset_base' >>> kdump-kern.o: In function `KernInit': >>> kdump-kern.c:(.text+0x58): undefined reference to `page_offset_base' >>> collect2: error: ld returned 1 exit status >>> make[1]: *** [modules] Error 1 >>> make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux' >>> make: *** [Modules] Error 1 >>> -------------------------------------------------------- >>> mmbuildgpl: Building GPL module failed at Thu Sep 28 08:12:14 MDT 2017. >>> -------------------------------------------------------- >>> mmbuildgpl: Command failed. Examine previous error messages to >>>determine >>> cause. >>> [root at bkupsvr3 ~]# >>> [root at bkupsvr3 ~]# >>> [root at bkupsvr3 ~]# >>> [root at bkupsvr3 ~]# >>> [root at bkupsvr3 ~]# uname -a >>> Linux bkupsvr3.arcc.uwyo.edu 3.10.0-693.2.2.el7.x86_64 #1 SMP Sat Sep 9 >>> 03:55:24 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux >>> [root at bkupsvr3 ~]# mmdiag --version >>> >>> === mmdiag: version === >>> Current GPFS build: "4.2.2.3 ". >>> Built on Mar 16 2017 at 11:19:59 >>> >>> In order to use GPFS with RHEL 7.4 I have to use a 7.3 kernel. In my >>> case 514.26.2 >>> >>> If I'm missing something can some one point me in the right direction? >>> >>> >>> -----Original Message----- >>> From: gpfsug-discuss-bounces at spectrumscale.org >>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Bryan >>> Banister >>> Sent: Thursday, September 28, 2017 8:22 AM >>> To: gpfsug main discussion list >>> Subject: Re: [gpfsug-discuss] el7.4 compatibility >>> >>> Please review this site: >>> >>> >>>https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.ht >>>ml >>> >>> Hope that helps, >>> -Bryan >>> >>> -----Original Message----- >>> From: gpfsug-discuss-bounces at spectrumscale.org >>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of >>> Greg.Lehmann at csiro.au >>> Sent: Wednesday, September 27, 2017 6:45 PM >>> To: gpfsug-discuss at spectrumscale.org >>> Subject: Re: [gpfsug-discuss] el7.4 compatibility >>> >>> Note: External Email >>> ------------------------------------------------- >>> >>> I guess I may as well ask about SLES 12 SP3 as well! TIA. >>> >>> -----Original Message----- >>> From: gpfsug-discuss-bounces at spectrumscale.org >>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Kenneth >>> Waegeman >>> Sent: Wednesday, 27 September 2017 6:17 PM >>> To: gpfsug-discuss at spectrumscale.org >>> Subject: [gpfsug-discuss] el7.4 compatibility >>> >>> Hi, >>> >>> Is there already some information available of gpfs (and protocols) on >>> el7.4 ? >>> >>> Thanks! >>> >>> Kenneth >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> ________________________________ >>> >>> Note: This email is for the confidential use of the named addressee(s) >>> only and may contain proprietary, confidential or privileged >>>information. >>> If you are not the intended recipient, you are hereby notified that any >>> review, dissemination or copying of this email is strictly prohibited, >>> and to please notify the sender immediately and destroy this email and >>> any attachments. Email transmission cannot be guaranteed to be secure >>>or >>> error-free. The Company, therefore, does not make any guarantees as to >>> the completeness or accuracy of this email or any attachments. This >>>email >>> is for informational purposes only and does not constitute a >>> recommendation, offer, request or solicitation of any kind to buy, >>>sell, >>> subscribe, redeem or perform any type of transaction of a financial >>> product. >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > From aaron.s.knister at nasa.gov Fri Sep 29 02:59:39 2017 From: aaron.s.knister at nasa.gov (Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]) Date: Fri, 29 Sep 2017 01:59:39 +0000 Subject: [gpfsug-discuss] Latest recommended 4.2 efix? Message-ID: Hi Everyone, What?s the latest recommended efix release for 4.2.3.4? I?m working on testing a 4.1 to 4.2 migration and was reminded today of some fun bugs in 4.2.3.4 for which I think there are efixes. Alternatively, any word on a 4.2.3.5 release date? -Aaron -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.hearns at asml.com Fri Sep 29 10:02:26 2017 From: john.hearns at asml.com (John Hearns) Date: Fri, 29 Sep 2017 09:02:26 +0000 Subject: [gpfsug-discuss] el7.4 compatibility In-Reply-To: References: <0285ed0191fa43ac9b1f1b3e36a1f015@exch1-cdc.nexus.csiro.au> <087bdf18-5763-0154-8515-7b9d04e5b302@ugent.be> Message-ID: Simon, I would appreciate a heads up on that AFM issue. I upgraded to 4.2.3.4 this morning, to deal with an AFM issue, which is if a remote NFS mount goes down then an asynchronous operation such as a read can be stopped. I must admit to being not clued up on how the efixes are distributed. I downloaded the 4.2.3.4 installer for Linux yesterday. Should I be searching for additional fix packs on top of that (which I am in fact doing now). John H -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (IT Research Support) Sent: Thursday, September 28, 2017 4:45 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] el7.4 compatibility Aren't listed as tested Sorry ... 4.2.3.4 we have used with 7.4 as well, efix2 includes a fix for an AFM issue we have. Simon On 28/09/2017, 15:36, "kenneth.waegeman at ugent.be" wrote: > > >On 28/09/17 16:23, Simon Thompson (IT Research Support) wrote: >> The 7.4 kernels are listed as having been tested by IBM. >Hi, > >Were did you find this? >> >> Having said that, we have clients running 7.4 kernel and its OK, but >> we are 4.2.3.4efix2, so bump versions... >Do you have some information about the efix2? Is this for 7.4 ? And >where should we find this :-) > >Thank you! > >Kenneth > >> >> Simon >> >> On 28/09/2017, 15:18, "gpfsug-discuss-bounces at spectrumscale.org on >>behalf of Jeffrey R. Lang" >on behalf of JRLang at uwyo.edu> wrote: >> >>> I just tired to build the GPFS GPL module against the latest version >>>of RHEL 7.4 kernel and the build fails. The link below show that it >>>should work. >>> >>> cc kdump.o kdump-kern.o kdump-kern-dwarfs.o -o kdump -lpthread >>> kdump-kern.o: In function `GetOffset': >>> kdump-kern.c:(.text+0x9): undefined reference to `page_offset_base' >>> kdump-kern.o: In function `KernInit': >>> kdump-kern.c:(.text+0x58): undefined reference to `page_offset_base' >>> collect2: error: ld returned 1 exit status >>> make[1]: *** [modules] Error 1 >>> make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux' >>> make: *** [Modules] Error 1 >>> -------------------------------------------------------- >>> mmbuildgpl: Building GPL module failed at Thu Sep 28 08:12:14 MDT 2017. >>> -------------------------------------------------------- >>> mmbuildgpl: Command failed. Examine previous error messages to >>>determine cause. >>> [root at bkupsvr3 ~]# >>> [root at bkupsvr3 ~]# >>> [root at bkupsvr3 ~]# >>> [root at bkupsvr3 ~]# >>> [root at bkupsvr3 ~]# uname -a >>> Linux bkupsvr3.arcc.uwyo.edu 3.10.0-693.2.2.el7.x86_64 #1 SMP Sat >>>Sep 9 >>> 03:55:24 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux >>> [root at bkupsvr3 ~]# mmdiag --version >>> >>> === mmdiag: version === >>> Current GPFS build: "4.2.2.3 ". >>> Built on Mar 16 2017 at 11:19:59 >>> >>> In order to use GPFS with RHEL 7.4 I have to use a 7.3 kernel. In >>> my case 514.26.2 >>> >>> If I'm missing something can some one point me in the right direction? >>> >>> >>> -----Original Message----- >>> From: gpfsug-discuss-bounces at spectrumscale.org >>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Bryan >>> Banister >>> Sent: Thursday, September 28, 2017 8:22 AM >>> To: gpfsug main discussion list >>> Subject: Re: [gpfsug-discuss] el7.4 compatibility >>> >>> Please review this site: >>> >>> >>>https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fww >>>w.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSTXKQY%2Fgpfsclustersfaq >>>.ht&data=01%7C01%7Cjohn.hearns%40asml.com%7C1c91f855bc124c31f81a08d50 >>>67f949a%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=nK6KEzCD62kU3njL >>>kIFKL69V3jyN836K5pHMX19tWk8%3D&reserved=0 >>>ml >>> >>> Hope that helps, >>> -Bryan >>> >>> -----Original Message----- >>> From: gpfsug-discuss-bounces at spectrumscale.org >>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of >>> Greg.Lehmann at csiro.au >>> Sent: Wednesday, September 27, 2017 6:45 PM >>> To: gpfsug-discuss at spectrumscale.org >>> Subject: Re: [gpfsug-discuss] el7.4 compatibility >>> >>> Note: External Email >>> ------------------------------------------------- >>> >>> I guess I may as well ask about SLES 12 SP3 as well! TIA. >>> >>> -----Original Message----- >>> From: gpfsug-discuss-bounces at spectrumscale.org >>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of >>> Kenneth Waegeman >>> Sent: Wednesday, 27 September 2017 6:17 PM >>> To: gpfsug-discuss at spectrumscale.org >>> Subject: [gpfsug-discuss] el7.4 compatibility >>> >>> Hi, >>> >>> Is there already some information available of gpfs (and protocols) >>> on >>> el7.4 ? >>> >>> Thanks! >>> >>> Kenneth >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgp >>> fsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.h >>> earns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944e >>> b2a39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Y >>> tqc6pw%3D&reserved=0 _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgp >>> fsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.h >>> earns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944e >>> b2a39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Y >>> tqc6pw%3D&reserved=0 >>> >>> >>> ________________________________ >>> >>> Note: This email is for the confidential use of the named >>>addressee(s) only and may contain proprietary, confidential or >>>privileged information. >>> If you are not the intended recipient, you are hereby notified that >>>any review, dissemination or copying of this email is strictly >>>prohibited, and to please notify the sender immediately and destroy >>>this email and any attachments. Email transmission cannot be >>>guaranteed to be secure or error-free. The Company, therefore, does >>>not make any guarantees as to the completeness or accuracy of this >>>email or any attachments. This email is for informational purposes >>>only and does not constitute a recommendation, offer, request or >>>solicitation of any kind to buy, sell, subscribe, redeem or perform >>>any type of transaction of a financial product. >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> >>>https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpf >>>sug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hea >>>rns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944eb2a >>>39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Ytqc6 >>>pw%3D&reserved=0 _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> >>>https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpf >>>sug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hea >>>rns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944eb2a >>>39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Ytqc6 >>>pw%3D&reserved=0 >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpf >> sug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hea >> rns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944eb2a >> 39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Ytqc6 >> pw%3D&reserved=0 > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hearns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Ytqc6pw%3D&reserved=0 -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. From r.sobey at imperial.ac.uk Fri Sep 29 10:04:49 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Fri, 29 Sep 2017 09:04:49 +0000 Subject: [gpfsug-discuss] el7.4 compatibility In-Reply-To: References: <0285ed0191fa43ac9b1f1b3e36a1f015@exch1-cdc.nexus.csiro.au> <087bdf18-5763-0154-8515-7b9d04e5b302@ugent.be> Message-ID: Efixes (in my one time only limited experience!) come direct from IBM as a result of a PMR. Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of John Hearns Sent: 29 September 2017 10:02 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] el7.4 compatibility Simon, I would appreciate a heads up on that AFM issue. I upgraded to 4.2.3.4 this morning, to deal with an AFM issue, which is if a remote NFS mount goes down then an asynchronous operation such as a read can be stopped. I must admit to being not clued up on how the efixes are distributed. I downloaded the 4.2.3.4 installer for Linux yesterday. Should I be searching for additional fix packs on top of that (which I am in fact doing now). John H -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (IT Research Support) Sent: Thursday, September 28, 2017 4:45 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] el7.4 compatibility Aren't listed as tested Sorry ... 4.2.3.4 we have used with 7.4 as well, efix2 includes a fix for an AFM issue we have. Simon On 28/09/2017, 15:36, "kenneth.waegeman at ugent.be" wrote: > > >On 28/09/17 16:23, Simon Thompson (IT Research Support) wrote: >> The 7.4 kernels are listed as having been tested by IBM. >Hi, > >Were did you find this? >> >> Having said that, we have clients running 7.4 kernel and its OK, but >> we are 4.2.3.4efix2, so bump versions... >Do you have some information about the efix2? Is this for 7.4 ? And >where should we find this :-) > >Thank you! > >Kenneth > >> >> Simon >> >> On 28/09/2017, 15:18, "gpfsug-discuss-bounces at spectrumscale.org on >>behalf of Jeffrey R. Lang" >on behalf of JRLang at uwyo.edu> wrote: >> >>> I just tired to build the GPFS GPL module against the latest version >>>of RHEL 7.4 kernel and the build fails. The link below show that it >>>should work. >>> >>> cc kdump.o kdump-kern.o kdump-kern-dwarfs.o -o kdump -lpthread >>> kdump-kern.o: In function `GetOffset': >>> kdump-kern.c:(.text+0x9): undefined reference to `page_offset_base' >>> kdump-kern.o: In function `KernInit': >>> kdump-kern.c:(.text+0x58): undefined reference to `page_offset_base' >>> collect2: error: ld returned 1 exit status >>> make[1]: *** [modules] Error 1 >>> make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux' >>> make: *** [Modules] Error 1 >>> -------------------------------------------------------- >>> mmbuildgpl: Building GPL module failed at Thu Sep 28 08:12:14 MDT 2017. >>> -------------------------------------------------------- >>> mmbuildgpl: Command failed. Examine previous error messages to >>>determine cause. >>> [root at bkupsvr3 ~]# >>> [root at bkupsvr3 ~]# >>> [root at bkupsvr3 ~]# >>> [root at bkupsvr3 ~]# >>> [root at bkupsvr3 ~]# uname -a >>> Linux bkupsvr3.arcc.uwyo.edu 3.10.0-693.2.2.el7.x86_64 #1 SMP Sat >>>Sep 9 >>> 03:55:24 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux >>> [root at bkupsvr3 ~]# mmdiag --version >>> >>> === mmdiag: version === >>> Current GPFS build: "4.2.2.3 ". >>> Built on Mar 16 2017 at 11:19:59 >>> >>> In order to use GPFS with RHEL 7.4 I have to use a 7.3 kernel. In >>> my case 514.26.2 >>> >>> If I'm missing something can some one point me in the right direction? >>> >>> >>> -----Original Message----- >>> From: gpfsug-discuss-bounces at spectrumscale.org >>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Bryan >>> Banister >>> Sent: Thursday, September 28, 2017 8:22 AM >>> To: gpfsug main discussion list >>> Subject: Re: [gpfsug-discuss] el7.4 compatibility >>> >>> Please review this site: >>> >>> >>>https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fww >>>w.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSTXKQY%2Fgpfsclustersfaq >>>.ht&data=01%7C01%7Cjohn.hearns%40asml.com%7C1c91f855bc124c31f81a08d50 >>>67f949a%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=nK6KEzCD62kU3njL >>>kIFKL69V3jyN836K5pHMX19tWk8%3D&reserved=0 >>>ml >>> >>> Hope that helps, >>> -Bryan >>> >>> -----Original Message----- >>> From: gpfsug-discuss-bounces at spectrumscale.org >>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of >>> Greg.Lehmann at csiro.au >>> Sent: Wednesday, September 27, 2017 6:45 PM >>> To: gpfsug-discuss at spectrumscale.org >>> Subject: Re: [gpfsug-discuss] el7.4 compatibility >>> >>> Note: External Email >>> ------------------------------------------------- >>> >>> I guess I may as well ask about SLES 12 SP3 as well! TIA. >>> >>> -----Original Message----- >>> From: gpfsug-discuss-bounces at spectrumscale.org >>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of >>> Kenneth Waegeman >>> Sent: Wednesday, 27 September 2017 6:17 PM >>> To: gpfsug-discuss at spectrumscale.org >>> Subject: [gpfsug-discuss] el7.4 compatibility >>> >>> Hi, >>> >>> Is there already some information available of gpfs (and protocols) >>> on >>> el7.4 ? >>> >>> Thanks! >>> >>> Kenneth >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgp >>> fsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.h >>> earns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944e >>> b2a39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Y >>> tqc6pw%3D&reserved=0 _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgp >>> fsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.h >>> earns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944e >>> b2a39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Y >>> tqc6pw%3D&reserved=0 >>> >>> >>> ________________________________ >>> >>> Note: This email is for the confidential use of the named >>>addressee(s) only and may contain proprietary, confidential or >>>privileged information. >>> If you are not the intended recipient, you are hereby notified that >>>any review, dissemination or copying of this email is strictly >>>prohibited, and to please notify the sender immediately and destroy >>>this email and any attachments. Email transmission cannot be >>>guaranteed to be secure or error-free. The Company, therefore, does >>>not make any guarantees as to the completeness or accuracy of this >>>email or any attachments. This email is for informational purposes >>>only and does not constitute a recommendation, offer, request or >>>solicitation of any kind to buy, sell, subscribe, redeem or perform >>>any type of transaction of a financial product. >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> >>>https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpf >>>sug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hea >>>rns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944eb2a >>>39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Ytqc6 >>>pw%3D&reserved=0 _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> >>>https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpf >>>sug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hea >>>rns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944eb2a >>>39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Ytqc6 >>>pw%3D&reserved=0 >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpf >> sug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hea >> rns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944eb2a >> 39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Ytqc6 >> pw%3D&reserved=0 > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hearns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Ytqc6pw%3D&reserved=0 -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From S.J.Thompson at bham.ac.uk Fri Sep 29 10:39:43 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Fri, 29 Sep 2017 09:39:43 +0000 Subject: [gpfsug-discuss] el7.4 compatibility In-Reply-To: References: <0285ed0191fa43ac9b1f1b3e36a1f015@exch1-cdc.nexus.csiro.au> <087bdf18-5763-0154-8515-7b9d04e5b302@ugent.be> Message-ID: Correct they some from IBM support. The AFM issue we have (and is fixed in the efix) is if you have client code running on the AFM cache that uses truncate. The AFM write coalescing processing does something funny with it, so the file isn't truncated and then the data you write afterwards isn't copied back to home. We found this with ABAQUS code running on our HPC nodes onto the AFM cache, I.e. At home, the final packed output file from ABAQUS is corrupt as its the "untruncated and then filled" version of the file (so just a big blob of empty data). I would guess that anything using truncate would see the same issue. 4.2.3.x: APAR IV99796 See IBM Flash Alert at: http://www-01.ibm.com/support/docview.wss?uid=ssg1S1010629&myns=s033&mynp=O CSTXKQY&mynp=OCSWJ00&mync=E&cm_sp=s033-_-OCSTXKQY-OCSWJ00-_-E Its remedied in efix2, of course remember that an efix has not gone through a full testing validation cycle (otherwise it would be a PTF), but we have not seen any issues in our environments running 4.2.3.4efix2. Simon On 29/09/2017, 10:04, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Sobey, Richard A" wrote: >Efixes (in my one time only limited experience!) come direct from IBM as >a result of a PMR. >Richard > >-----Original Message----- >From: gpfsug-discuss-bounces at spectrumscale.org >[mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of John Hearns >Sent: 29 September 2017 10:02 >To: gpfsug main discussion list >Subject: Re: [gpfsug-discuss] el7.4 compatibility > >Simon, >I would appreciate a heads up on that AFM issue. >I upgraded to 4.2.3.4 this morning, to deal with an AFM issue, which is >if a remote NFS mount goes down then an asynchronous operation such as a >read can be stopped. > >I must admit to being not clued up on how the efixes are distributed. I >downloaded the 4.2.3.4 installer for Linux yesterday. >Should I be searching for additional fix packs on top of that (which I am >in fact doing now). > >John H > > > > > >-----Original Message----- >From: gpfsug-discuss-bounces at spectrumscale.org >[mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon >Thompson (IT Research Support) >Sent: Thursday, September 28, 2017 4:45 PM >To: gpfsug main discussion list >Subject: Re: [gpfsug-discuss] el7.4 compatibility > > >Aren't listed as tested > >Sorry ... >4.2.3.4 we have used with 7.4 as well, efix2 includes a fix for an AFM >issue we have. > >Simon > >On 28/09/2017, 15:36, "kenneth.waegeman at ugent.be" > wrote: > >> >> >>On 28/09/17 16:23, Simon Thompson (IT Research Support) wrote: >>> The 7.4 kernels are listed as having been tested by IBM. >>Hi, >> >>Were did you find this? >>> >>> Having said that, we have clients running 7.4 kernel and its OK, but >>> we are 4.2.3.4efix2, so bump versions... >>Do you have some information about the efix2? Is this for 7.4 ? And >>where should we find this :-) >> >>Thank you! >> >>Kenneth >> >>> >>> Simon >>> >>> On 28/09/2017, 15:18, "gpfsug-discuss-bounces at spectrumscale.org on >>>behalf of Jeffrey R. Lang" >>on behalf of JRLang at uwyo.edu> wrote: >>> >>>> I just tired to build the GPFS GPL module against the latest version >>>>of RHEL 7.4 kernel and the build fails. The link below show that it >>>>should work. >>>> >>>> cc kdump.o kdump-kern.o kdump-kern-dwarfs.o -o kdump -lpthread >>>> kdump-kern.o: In function `GetOffset': >>>> kdump-kern.c:(.text+0x9): undefined reference to `page_offset_base' >>>> kdump-kern.o: In function `KernInit': >>>> kdump-kern.c:(.text+0x58): undefined reference to `page_offset_base' >>>> collect2: error: ld returned 1 exit status >>>> make[1]: *** [modules] Error 1 >>>> make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux' >>>> make: *** [Modules] Error 1 >>>> -------------------------------------------------------- >>>> mmbuildgpl: Building GPL module failed at Thu Sep 28 08:12:14 MDT >>>>2017. >>>> -------------------------------------------------------- >>>> mmbuildgpl: Command failed. Examine previous error messages to >>>>determine cause. >>>> [root at bkupsvr3 ~]# >>>> [root at bkupsvr3 ~]# >>>> [root at bkupsvr3 ~]# >>>> [root at bkupsvr3 ~]# >>>> [root at bkupsvr3 ~]# uname -a >>>> Linux bkupsvr3.arcc.uwyo.edu 3.10.0-693.2.2.el7.x86_64 #1 SMP Sat >>>>Sep 9 >>>> 03:55:24 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux >>>> [root at bkupsvr3 ~]# mmdiag --version >>>> >>>> === mmdiag: version === >>>> Current GPFS build: "4.2.2.3 ". >>>> Built on Mar 16 2017 at 11:19:59 >>>> >>>> In order to use GPFS with RHEL 7.4 I have to use a 7.3 kernel. In >>>> my case 514.26.2 >>>> >>>> If I'm missing something can some one point me in the right direction? >>>> >>>> >>>> -----Original Message----- >>>> From: gpfsug-discuss-bounces at spectrumscale.org >>>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Bryan >>>> Banister >>>> Sent: Thursday, September 28, 2017 8:22 AM >>>> To: gpfsug main discussion list >>>> Subject: Re: [gpfsug-discuss] el7.4 compatibility >>>> >>>> Please review this site: >>>> >>>> >>>>https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fww >>>>w.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSTXKQY%2Fgpfsclustersfaq >>>>.ht&data=01%7C01%7Cjohn.hearns%40asml.com%7C1c91f855bc124c31f81a08d50 >>>>67f949a%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=nK6KEzCD62kU3njL >>>>kIFKL69V3jyN836K5pHMX19tWk8%3D&reserved=0 >>>>ml >>>> >>>> Hope that helps, >>>> -Bryan >>>> >>>> -----Original Message----- >>>> From: gpfsug-discuss-bounces at spectrumscale.org >>>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of >>>> Greg.Lehmann at csiro.au >>>> Sent: Wednesday, September 27, 2017 6:45 PM >>>> To: gpfsug-discuss at spectrumscale.org >>>> Subject: Re: [gpfsug-discuss] el7.4 compatibility >>>> >>>> Note: External Email >>>> ------------------------------------------------- >>>> >>>> I guess I may as well ask about SLES 12 SP3 as well! TIA. >>>> >>>> -----Original Message----- >>>> From: gpfsug-discuss-bounces at spectrumscale.org >>>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of >>>> Kenneth Waegeman >>>> Sent: Wednesday, 27 September 2017 6:17 PM >>>> To: gpfsug-discuss at spectrumscale.org >>>> Subject: [gpfsug-discuss] el7.4 compatibility >>>> >>>> Hi, >>>> >>>> Is there already some information available of gpfs (and protocols) >>>> on >>>> el7.4 ? >>>> >>>> Thanks! >>>> >>>> Kenneth >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgp >>>> fsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.h >>>> earns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944e >>>> b2a39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Y >>>> tqc6pw%3D&reserved=0 _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgp >>>> fsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.h >>>> earns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944e >>>> b2a39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Y >>>> tqc6pw%3D&reserved=0 >>>> >>>> >>>> ________________________________ >>>> >>>> Note: This email is for the confidential use of the named >>>>addressee(s) only and may contain proprietary, confidential or >>>>privileged information. >>>> If you are not the intended recipient, you are hereby notified that >>>>any review, dissemination or copying of this email is strictly >>>>prohibited, and to please notify the sender immediately and destroy >>>>this email and any attachments. Email transmission cannot be >>>>guaranteed to be secure or error-free. The Company, therefore, does >>>>not make any guarantees as to the completeness or accuracy of this >>>>email or any attachments. This email is for informational purposes >>>>only and does not constitute a recommendation, offer, request or >>>>solicitation of any kind to buy, sell, subscribe, redeem or perform >>>>any type of transaction of a financial product. >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> >>>>https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpf >>>>sug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hea >>>>rns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944eb2a >>>>39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Ytqc6 >>>>pw%3D&reserved=0 _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> >>>>https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpf >>>>sug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hea >>>>rns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944eb2a >>>>39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Ytqc6 >>>>pw%3D&reserved=0 >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpf >>> sug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hea >>> rns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944eb2a >>> 39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Ytqc6 >>> pw%3D&reserved=0 >> > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.o >rg%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hearns%40asml >.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944eb2a39d93e96cad61fc >%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Ytqc6pw%3D&reserved=0 >-- The information contained in this communication and any attachments is >confidential and may be privileged, and is for the sole use of the >intended recipient(s). Any unauthorized review, use, disclosure or >distribution is prohibited. Unless explicitly stated otherwise in the >body of this communication or the attachment thereto (if any), the >information is provided on an AS-IS basis without any express or implied >warranties or liabilities. To the extent you are relying on this >information, you are doing so at your own risk. If you are not the >intended recipient, please notify the sender immediately by replying to >this message and destroy all copies of this message and any attachments. >Neither the sender nor the company/group of companies he or she >represents shall be liable for the proper and complete transmission of >the information contained in this communication, or for any delay in its >receipt. >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss From scale at us.ibm.com Fri Sep 29 13:26:51 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Fri, 29 Sep 2017 07:26:51 -0500 Subject: [gpfsug-discuss] Latest recommended 4.2 efix? In-Reply-To: References: Message-ID: There isn't a "recommended" efix as such. Generally, fixes go into the next ptf so that they go through a test cycle. If a customer hits a serious issue that cannot wait for the next ptf, they can request an efix be built, but since efixes do not get the same level of rigorous testing as a ptf, they are not generally recommended unless you report an issue and service determines you need it. To address your other questions: We are currently up to efix3 on 4.2.3.4 We don't announce PTF dates, because they depend upon the testing; however, you can see that we generally release a PTF roughly every 6 weeks and I believe ptf4 was out on 8/24 Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" To: "discussion, gpfsug main" Date: 09/28/2017 08:59 PM Subject: [gpfsug-discuss] Latest recommended 4.2 efix? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Everyone, What?s the latest recommended efix release for 4.2.3.4? I?m working on testing a 4.1 to 4.2 migration and was reminded today of some fun bugs in 4.2.3.4 for which I think there are efixes. Alternatively, any word on a 4.2.3.5 release date? -Aaron _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=IVcYH9EDg-UaA4Jt2GbsxN5XN1XbvejXTX0gAzNxtpM&s=9SmogyyA6QNSWxlZrpE-vBbslts0UexwJwPzp78LgKs&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From sandeep.patil at in.ibm.com Sat Sep 30 05:02:22 2017 From: sandeep.patil at in.ibm.com (Sandeep Ramesh) Date: Sat, 30 Sep 2017 09:32:22 +0530 Subject: [gpfsug-discuss] Spectrum Scale Enablement Material - 1H 2017 Message-ID: Hi Folks I was asked by Doris Conti to send the below to our Spectrum Scale User group. Below is a consolidated link that list all the enablement on Spectrum Scale/ESS that was done in 1H 2017 - which have blogs and videos from development and offering management. https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media Do note, Spectrum Scale developers keep blogging on the below site which is worth bookmarking: https://developer.ibm.com/storage/blog/ (as recent as 4 new blogs in Sept) Thanks Sandeep Linkedin: https://www.linkedin.com/in/sandeeprpatil Spectrum Scale Dev. -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Fri Sep 1 09:45:24 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Fri, 1 Sep 2017 08:45:24 +0000 Subject: [gpfsug-discuss] GPFS GUI Nodes > NSD no data Message-ID: For some time now if I go into the GUI, select Monitoring > Nodes > NSD Server Nodes, the only columns with good data are Name, State and NSD Count. Everything else e.g. Avg Disk Wait Read is listed "N/A". Is this another config option I need to enable? It's been bugging me for a while, I don't think I've seen it work since 4.2.1 which was the first time I saw the GUI. Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From bart.vandamme at sdnsquare.com Fri Sep 1 10:30:59 2017 From: bart.vandamme at sdnsquare.com (Bart Van Damme) Date: Fri, 1 Sep 2017 11:30:59 +0200 Subject: [gpfsug-discuss] SMB2 leases - oplocks - growing files Message-ID: We are a company located in Belgium that mainly implements spectrum scale clusters in the Media and broadcasting industry. Currently we have a customer who wants to export the scale file system over samba 4.5 and 4.6. In these versions the SMB2 leases are activated by default for enhancing the oplocks system. The problem is when this option is not disabled Adobe (and probably Windows) is not notified the size of the file have changed, resulting that reading growing file in Adobe is not working, the timeline is not updated. Does anybody had this issues before and know how to solve it. This is the smb.conf file: ============================ # Global options smb2 leases = yes client use spnego = yes clustering = yes unix extensions = no mangled names = no ea support = yes store dos attributes = yes map readonly = no map archive = yes map system = no force unknown acl user = yes obey pam restrictions = no deadtime = 480 disable netbios = yes server signing = disabled server min protocol = SMB2 smb encrypt = off # We do not allow guest usage. guest ok = no guest account = nobody map to guest = bad user # disable printing load printers = no printing = bsd printcap name = /dev/null disable spoolss = yes # log settings log file = /var/log/samba/log.%m # max 500KB per log file, then rotate max log size = 500 log level = 1 passdb:1 auth:1 winbind:1 idmap:1 #============ Share Definitions ============ [pfs] comment = GPFS path = /gpfs/pfs valid users = @ug_numpr writeable = yes inherit permissions = yes create mask = 664 force create mode = 664 nfs4:chown = yes nfs4:acedup = merge nfs4:mode = special fileid:algorithm = fsname vfs objects = shadow_copy2 gpfs fileid full_audit full_audit:prefix = %u|%I|%m|%S full_audit:success = rename unlink rmdir full_audit:failure = none full_audit:facility = local6 full_audit:priority = NOTICE shadow:fixinodes = yes gpfs:sharemodes = yes gpfs:winattr = yes gpfs:leases = no locking = yes posix locking = yes oplocks = yes kernel oplocks = no Grtz, Bart *Bart Van Damme * *Customer Project Manager* *SDNsquare* Technologiepark 3, 9052 Zwijnaarde, Belgium www.sdnsquare.com T: + 32 9 241 56 01 <09%20241%2056%2001> M: + 32 496 59 23 09 *This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email.* Virusvrij. www.avast.com <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Fri Sep 1 14:36:56 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Fri, 1 Sep 2017 13:36:56 +0000 Subject: [gpfsug-discuss] GPFS GUI Nodes > NSD no data In-Reply-To: References: Message-ID: Resolved this, guessed at changing GPFSNSDDisk.period to 5. From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 01 September 2017 09:45 To: 'gpfsug-discuss at spectrumscale.org' Subject: [gpfsug-discuss] GPFS GUI Nodes > NSD no data For some time now if I go into the GUI, select Monitoring > Nodes > NSD Server Nodes, the only columns with good data are Name, State and NSD Count. Everything else e.g. Avg Disk Wait Read is listed "N/A". Is this another config option I need to enable? It's been bugging me for a while, I don't think I've seen it work since 4.2.1 which was the first time I saw the GUI. Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Fri Sep 1 21:56:25 2017 From: ewahl at osc.edu (Edward Wahl) Date: Fri, 1 Sep 2017 16:56:25 -0400 Subject: [gpfsug-discuss] Change to default for verbsRdmaMinBytes? Message-ID: <20170901165625.6e4edd4c@osc.edu> Howdy. Just noticed this change to min RDMA packet size and I don't seem to see it in any patch notes. Maybe I just skipped the one where this changed? mmlsconfig verbsRdmaMinBytes verbsRdmaMinBytes 16384 (in case someone thinks we changed it) [root at proj-nsd01 ~]# mmlsconfig |grep verbs verbsRdma enable verbsRdma disable verbsRdmasPerConnection 14 verbsRdmasPerNode 1024 verbsPorts mlx5_3/1 verbsPorts mlx4_0 verbsPorts mlx5_0 verbsPorts mlx5_0 mlx5_1 verbsPorts mlx4_1/1 verbsPorts mlx4_1/2 Oddly I also see this in config, though I've seen these kinds of things before. mmdiag --config |grep verbsRdmaMinBytes verbsRdmaMinBytes 8192 We're on a recent efix. Current GPFS build: "4.2.2.3 efix21 (1028007)". -- Ed Wahl Ohio Supercomputer Center 614-292-9302 From akers at vt.edu Fri Sep 1 22:06:15 2017 From: akers at vt.edu (Joshua Akers) Date: Fri, 01 Sep 2017 21:06:15 +0000 Subject: [gpfsug-discuss] Quorum managers Message-ID: Hi all, I was wondering how most people set up quorum managers. We historically had physical admin nodes be the quorum managers, but are switching to a virtualized admin services infrastructure. We have been choosing a few compute nodes to act as quorum managers in our client clusters, but have considered using virtual machines instead. Has anyone else done this? Regards, Josh -- *Joshua D. Akers* *HPC Team Lead* NI&S Systems Support (MC0214) 1700 Pratt Drive Blacksburg, VA 24061 540-231-9506 -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Fri Sep 1 23:42:55 2017 From: oehmes at gmail.com (Sven Oehme) Date: Fri, 01 Sep 2017 22:42:55 +0000 Subject: [gpfsug-discuss] Change to default for verbsRdmaMinBytes? In-Reply-To: <20170901165625.6e4edd4c@osc.edu> References: <20170901165625.6e4edd4c@osc.edu> Message-ID: Hi Ed, yes the defaults for that have changed for customers who had not overridden the default settings. the reason we did this was that many systems in the field including all ESS systems that come pre-tuned where manually changed to 8k from the 16k default due to better performance that was confirmed in multiple customer engagements and tests with various settings , therefore we change the default to what it should be in the field so people are not bothered to set it anymore (simplification) or get benefits by changing the default to provides better performance. all this happened when we did the communication code overhaul that did lead to significant (think factors) of improved RPC performance for RDMA and VERBS workloads. there is another round of significant enhancements coming soon , that will make even more parameters either obsolete or change some of the defaults for better out of the box performance. i see that we should probably enhance the communication of this changes, not that i think this will have any negative effect compared to what your performance was with the old setting i am actually pretty confident that you get better performance with the new code, but by setting parameters back to default on most 'manual tuned' probably makes your system even faster. if you have a Scale Client on 4.2.3+ you really shouldn't have anything set beside maxfilestocache, pagepool, workerthreads and potential prefetch , if you are a protocol node, this and settings specific to an export (e.g. SMB, NFS set some special settings) , pretty much everything else these days should be set to default so the code can pick the correct parameters., if its not and you get better performance by manual tweaking something i like to hear about it. on the communication side in the next release will eliminate another set of parameters that are now 'auto set' and we plan to work on NSD next. i presented various slides about the communication and simplicity changes in various forums, latest public non NDA slides i presented are here --> http://files.gpfsug.org/presentations/2017/Manchester/08_Research_Topics.pdf hope this helps . Sven On Fri, Sep 1, 2017 at 1:56 PM Edward Wahl wrote: > Howdy. Just noticed this change to min RDMA packet size and I don't seem > to > see it in any patch notes. Maybe I just skipped the one where this > changed? > > mmlsconfig verbsRdmaMinBytes > verbsRdmaMinBytes 16384 > > (in case someone thinks we changed it) > > [root at proj-nsd01 ~]# mmlsconfig |grep verbs > verbsRdma enable > verbsRdma disable > verbsRdmasPerConnection 14 > verbsRdmasPerNode 1024 > verbsPorts mlx5_3/1 > verbsPorts mlx4_0 > verbsPorts mlx5_0 > verbsPorts mlx5_0 mlx5_1 > verbsPorts mlx4_1/1 > verbsPorts mlx4_1/2 > > > Oddly I also see this in config, though I've seen these kinds of things > before. > mmdiag --config |grep verbsRdmaMinBytes > verbsRdmaMinBytes 8192 > > We're on a recent efix. > Current GPFS build: "4.2.2.3 efix21 (1028007)". > > -- > > Ed Wahl > Ohio Supercomputer Center > 614-292-9302 <(614)%20292-9302> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From truongv at us.ibm.com Fri Sep 1 23:56:23 2017 From: truongv at us.ibm.com (Truong Vu) Date: Fri, 1 Sep 2017 18:56:23 -0400 Subject: [gpfsug-discuss] Change to default for verbsRdmaMinBytes? In-Reply-To: References: Message-ID: The discrepancy between the mmlsconfig view and mmdiag has been fixed in GFPS 4.2.3 version. Note, mmdiag reports the correct default value. Tru. From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 09/01/2017 06:43 PM Subject: gpfsug-discuss Digest, Vol 68, Issue 2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=yK4FkYvJ21ubvurR6W1Pi3qvNw9ydj2XP0ghXPc7DUw&s=xZHUN9ZlFjvgBmBB8wnX2cQDQQV42R_q-xHubNA3JBM&e= or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: GPFS GUI Nodes > NSD no data (Sobey, Richard A) 2. Change to default for verbsRdmaMinBytes? (Edward Wahl) 3. Quorum managers (Joshua Akers) 4. Re: Change to default for verbsRdmaMinBytes? (Sven Oehme) ---------------------------------------------------------------------- Message: 1 Date: Fri, 1 Sep 2017 13:36:56 +0000 From: "Sobey, Richard A" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS GUI Nodes > NSD no data Message-ID: Content-Type: text/plain; charset="us-ascii" Resolved this, guessed at changing GPFSNSDDisk.period to 5. From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 01 September 2017 09:45 To: 'gpfsug-discuss at spectrumscale.org' Subject: [gpfsug-discuss] GPFS GUI Nodes > NSD no data For some time now if I go into the GUI, select Monitoring > Nodes > NSD Server Nodes, the only columns with good data are Name, State and NSD Count. Everything else e.g. Avg Disk Wait Read is listed "N/A". Is this another config option I need to enable? It's been bugging me for a while, I don't think I've seen it work since 4.2.1 which was the first time I saw the GUI. Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: < https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_pipermail_gpfsug-2Ddiscuss_attachments_20170901_2a4162e9_attachment-2D0001.html&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=yK4FkYvJ21ubvurR6W1Pi3qvNw9ydj2XP0ghXPc7DUw&s=jcPGl5zwtQFMbnEmBpNErsD43uwoVeKgKk_8j7ZeCJY&e= > ------------------------------ Message: 2 Date: Fri, 1 Sep 2017 16:56:25 -0400 From: Edward Wahl To: gpfsug main discussion list Subject: [gpfsug-discuss] Change to default for verbsRdmaMinBytes? Message-ID: <20170901165625.6e4edd4c at osc.edu> Content-Type: text/plain; charset="US-ASCII" Howdy. Just noticed this change to min RDMA packet size and I don't seem to see it in any patch notes. Maybe I just skipped the one where this changed? mmlsconfig verbsRdmaMinBytes verbsRdmaMinBytes 16384 (in case someone thinks we changed it) [root at proj-nsd01 ~]# mmlsconfig |grep verbs verbsRdma enable verbsRdma disable verbsRdmasPerConnection 14 verbsRdmasPerNode 1024 verbsPorts mlx5_3/1 verbsPorts mlx4_0 verbsPorts mlx5_0 verbsPorts mlx5_0 mlx5_1 verbsPorts mlx4_1/1 verbsPorts mlx4_1/2 Oddly I also see this in config, though I've seen these kinds of things before. mmdiag --config |grep verbsRdmaMinBytes verbsRdmaMinBytes 8192 We're on a recent efix. Current GPFS build: "4.2.2.3 efix21 (1028007)". -- Ed Wahl Ohio Supercomputer Center 614-292-9302 ------------------------------ Message: 3 Date: Fri, 01 Sep 2017 21:06:15 +0000 From: Joshua Akers To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Quorum managers Message-ID: Content-Type: text/plain; charset="utf-8" Hi all, I was wondering how most people set up quorum managers. We historically had physical admin nodes be the quorum managers, but are switching to a virtualized admin services infrastructure. We have been choosing a few compute nodes to act as quorum managers in our client clusters, but have considered using virtual machines instead. Has anyone else done this? Regards, Josh -- *Joshua D. Akers* *HPC Team Lead* NI&S Systems Support (MC0214) 1700 Pratt Drive Blacksburg, VA 24061 540-231-9506 -------------- next part -------------- An HTML attachment was scrubbed... URL: < https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_pipermail_gpfsug-2Ddiscuss_attachments_20170901_a49947db_attachment-2D0001.html&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=yK4FkYvJ21ubvurR6W1Pi3qvNw9ydj2XP0ghXPc7DUw&s=Gag0raQbp7KZAyINlnmuxlnpjboo9XOWO3dDL2HCsZo&e= > ------------------------------ Message: 4 Date: Fri, 01 Sep 2017 22:42:55 +0000 From: Sven Oehme To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Change to default for verbsRdmaMinBytes? Message-ID: Content-Type: text/plain; charset="utf-8" Hi Ed, yes the defaults for that have changed for customers who had not overridden the default settings. the reason we did this was that many systems in the field including all ESS systems that come pre-tuned where manually changed to 8k from the 16k default due to better performance that was confirmed in multiple customer engagements and tests with various settings , therefore we change the default to what it should be in the field so people are not bothered to set it anymore (simplification) or get benefits by changing the default to provides better performance. all this happened when we did the communication code overhaul that did lead to significant (think factors) of improved RPC performance for RDMA and VERBS workloads. there is another round of significant enhancements coming soon , that will make even more parameters either obsolete or change some of the defaults for better out of the box performance. i see that we should probably enhance the communication of this changes, not that i think this will have any negative effect compared to what your performance was with the old setting i am actually pretty confident that you get better performance with the new code, but by setting parameters back to default on most 'manual tuned' probably makes your system even faster. if you have a Scale Client on 4.2.3+ you really shouldn't have anything set beside maxfilestocache, pagepool, workerthreads and potential prefetch , if you are a protocol node, this and settings specific to an export (e.g. SMB, NFS set some special settings) , pretty much everything else these days should be set to default so the code can pick the correct parameters., if its not and you get better performance by manual tweaking something i like to hear about it. on the communication side in the next release will eliminate another set of parameters that are now 'auto set' and we plan to work on NSD next. i presented various slides about the communication and simplicity changes in various forums, latest public non NDA slides i presented are here --> https://urldefense.proofpoint.com/v2/url?u=http-3A__files.gpfsug.org_presentations_2017_Manchester_08-5FResearch-5FTopics.pdf&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=yK4FkYvJ21ubvurR6W1Pi3qvNw9ydj2XP0ghXPc7DUw&s=8c_55Ld_iAC2sr_QU0cyGiOiyU7Z9NjcVknVuRpRIlk&e= hope this helps . Sven On Fri, Sep 1, 2017 at 1:56 PM Edward Wahl wrote: > Howdy. Just noticed this change to min RDMA packet size and I don't seem > to > see it in any patch notes. Maybe I just skipped the one where this > changed? > > mmlsconfig verbsRdmaMinBytes > verbsRdmaMinBytes 16384 > > (in case someone thinks we changed it) > > [root at proj-nsd01 ~]# mmlsconfig |grep verbs > verbsRdma enable > verbsRdma disable > verbsRdmasPerConnection 14 > verbsRdmasPerNode 1024 > verbsPorts mlx5_3/1 > verbsPorts mlx4_0 > verbsPorts mlx5_0 > verbsPorts mlx5_0 mlx5_1 > verbsPorts mlx4_1/1 > verbsPorts mlx4_1/2 > > > Oddly I also see this in config, though I've seen these kinds of things > before. > mmdiag --config |grep verbsRdmaMinBytes > verbsRdmaMinBytes 8192 > > We're on a recent efix. > Current GPFS build: "4.2.2.3 efix21 (1028007)". > > -- > > Ed Wahl > Ohio Supercomputer Center > 614-292-9302 <(614)%20292-9302> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=yK4FkYvJ21ubvurR6W1Pi3qvNw9ydj2XP0ghXPc7DUw&s=xZHUN9ZlFjvgBmBB8wnX2cQDQQV42R_q-xHubNA3JBM&e= > -------------- next part -------------- An HTML attachment was scrubbed... URL: < https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_pipermail_gpfsug-2Ddiscuss_attachments_20170901_b75cfc74_attachment.html&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=yK4FkYvJ21ubvurR6W1Pi3qvNw9ydj2XP0ghXPc7DUw&s=LpVpXMgqE_LD-t_J7yfNwURUrdUR29TzWvjVTi18kpA&e= > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=yK4FkYvJ21ubvurR6W1Pi3qvNw9ydj2XP0ghXPc7DUw&s=xZHUN9ZlFjvgBmBB8wnX2cQDQQV42R_q-xHubNA3JBM&e= End of gpfsug-discuss Digest, Vol 68, Issue 2 ********************************************* -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From r.sobey at imperial.ac.uk Sat Sep 2 10:35:34 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Sat, 2 Sep 2017 09:35:34 +0000 Subject: [gpfsug-discuss] Date formats inconsistent mmfs.log Message-ID: Is there a good reason for the date formats in mmfs.log to be inconsistent? Apart from my OCD getting the better of me, it makes log analysis a bit difficult. Sat Sep 2 10:33:42.145 2017: [I] Command: successful mount gpfs Sat 2 Sep 10:33:42 BST 2017: finished mounting /dev/gpfs Sat Sep 2 10:33:42.168 2017: [I] Calling user exit script mmSysMonGpfsStartup: event startup, Async command /usr/lpp/mmfs/bin/mmsysmoncontrol. Sat Sep 2 10:33:42.190 2017: [I] Calling user exit script mmSinceShutdownRoleChange: event startup, Async command /usr/lpp/mmfs/bin/mmsysmonc. Sat 2 Sep 10:33:42 BST 2017: [I] sendRasEventToMonitor: Successfully send a filesystem event to monitor Sat 2 Sep 10:33:42 BST 2017: [I] The Spectrum Scale monitoring service is already running. Pid=5134 Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From truongv at us.ibm.com Sat Sep 2 12:40:15 2017 From: truongv at us.ibm.com (Truong Vu) Date: Sat, 2 Sep 2017 07:40:15 -0400 Subject: [gpfsug-discuss] Date formats inconsistent mmfs.log In-Reply-To: References: Message-ID: The dates that have the zone abbreviation are from the scripts which use the OS date command. The daemon has its own format. This inconsistency has been address in 4.2.2. From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 09/02/2017 07:00 AM Subject: gpfsug-discuss Digest, Vol 68, Issue 4 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=wiPE5K_0qzTwdloCshNcSyamVNRJKz5WyOBal7dMz8w&s=pd3-zi8UQxVOjxOYxqbuaFSvv_71WENUBJsw0KUV3ro&e= or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Date formats inconsistent mmfs.log (Sobey, Richard A) ---------------------------------------------------------------------- Message: 1 Date: Sat, 2 Sep 2017 09:35:34 +0000 From: "Sobey, Richard A" To: "'gpfsug-discuss at spectrumscale.org'" Subject: [gpfsug-discuss] Date formats inconsistent mmfs.log Message-ID: Content-Type: text/plain; charset="us-ascii" Is there a good reason for the date formats in mmfs.log to be inconsistent? Apart from my OCD getting the better of me, it makes log analysis a bit difficult. Sat Sep 2 10:33:42.145 2017: [I] Command: successful mount gpfs Sat 2 Sep 10:33:42 BST 2017: finished mounting /dev/gpfs Sat Sep 2 10:33:42.168 2017: [I] Calling user exit script mmSysMonGpfsStartup: event startup, Async command /usr/lpp/mmfs/bin/mmsysmoncontrol. Sat Sep 2 10:33:42.190 2017: [I] Calling user exit script mmSinceShutdownRoleChange: event startup, Async command /usr/lpp/mmfs/bin/mmsysmonc. Sat 2 Sep 10:33:42 BST 2017: [I] sendRasEventToMonitor: Successfully send a filesystem event to monitor Sat 2 Sep 10:33:42 BST 2017: [I] The Spectrum Scale monitoring service is already running. Pid=5134 Cheers Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: < https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_pipermail_gpfsug-2Ddiscuss_attachments_20170902_4f65f336_attachment-2D0001.html&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=wiPE5K_0qzTwdloCshNcSyamVNRJKz5WyOBal7dMz8w&s=fNT71mM8obJ9rwxzm3Uzxw4mayi2pQg1u950E1raYK4&e= > ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM&m=wiPE5K_0qzTwdloCshNcSyamVNRJKz5WyOBal7dMz8w&s=pd3-zi8UQxVOjxOYxqbuaFSvv_71WENUBJsw0KUV3ro&e= End of gpfsug-discuss Digest, Vol 68, Issue 4 ********************************************* -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From john.hearns at asml.com Mon Sep 4 08:43:59 2017 From: john.hearns at asml.com (John Hearns) Date: Mon, 4 Sep 2017 07:43:59 +0000 Subject: [gpfsug-discuss] Date formats inconsistent mmfs.log In-Reply-To: References: Message-ID: Richard, The date format changed at an update level. We recently updated to 4.2.3 and when you run mmchconfig release=LATEST you are prompted to confirm that the new log format can be used. I guess you might not have cut all nodes over yet on your update over the weekend? Cut and paste from the documentation: mmfsLogTimeStampISO8601={yes | no} Setting this parameter to no allows the cluster to continue running with the earlier log time stamp format. For more information, see Security mode. * Set mmfsLogTimeStampISO8061 to no if you save log information and you are not yet ready to switch to the new log time stamp format. After you complete the migration, you can change the log time stamp format at any time with the mmchconfig command. * Omit this parameter if you are ready to switch to the new format. The default value is yes From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: Saturday, September 02, 2017 11:36 AM To: 'gpfsug-discuss at spectrumscale.org' Subject: [gpfsug-discuss] Date formats inconsistent mmfs.log Is there a good reason for the date formats in mmfs.log to be inconsistent? Apart from my OCD getting the better of me, it makes log analysis a bit difficult. Sat Sep 2 10:33:42.145 2017: [I] Command: successful mount gpfs Sat 2 Sep 10:33:42 BST 2017: finished mounting /dev/gpfs Sat Sep 2 10:33:42.168 2017: [I] Calling user exit script mmSysMonGpfsStartup: event startup, Async command /usr/lpp/mmfs/bin/mmsysmoncontrol. Sat Sep 2 10:33:42.190 2017: [I] Calling user exit script mmSinceShutdownRoleChange: event startup, Async command /usr/lpp/mmfs/bin/mmsysmonc. Sat 2 Sep 10:33:42 BST 2017: [I] sendRasEventToMonitor: Successfully send a filesystem event to monitor Sat 2 Sep 10:33:42 BST 2017: [I] The Spectrum Scale monitoring service is already running. Pid=5134 Cheers Richard -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Mon Sep 4 09:05:10 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Mon, 4 Sep 2017 08:05:10 +0000 Subject: [gpfsug-discuss] Date formats inconsistent mmfs.log In-Reply-To: References: , Message-ID: Ah. I'm running 4.2.3 but haven't changed the release level. I'll get that sorted out. Thanks for the replies! Get Outlook for Android ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of John Hearns Sent: Monday, September 4, 2017 8:43:59 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Date formats inconsistent mmfs.log Richard, The date format changed at an update level. We recently updated to 4.2.3 and when you run mmchconfig release=LATEST you are prompted to confirm that the new log format can be used. I guess you might not have cut all nodes over yet on your update over the weekend? Cut and paste from the documentation: mmfsLogTimeStampISO8601={yes | no} Setting this parameter to no allows the cluster to continue running with the earlier log time stamp format. For more information, see Security mode. ? Set mmfsLogTimeStampISO8061 to no if you save log information and you are not yet ready to switch to the new log time stamp format. After you complete the migration, you can change the log time stamp format at any time with the mmchconfig command. ? Omit this parameter if you are ready to switch to the new format. The default value is yes From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: Saturday, September 02, 2017 11:36 AM To: 'gpfsug-discuss at spectrumscale.org' Subject: [gpfsug-discuss] Date formats inconsistent mmfs.log Is there a good reason for the date formats in mmfs.log to be inconsistent? Apart from my OCD getting the better of me, it makes log analysis a bit difficult. Sat Sep 2 10:33:42.145 2017: [I] Command: successful mount gpfs Sat 2 Sep 10:33:42 BST 2017: finished mounting /dev/gpfs Sat Sep 2 10:33:42.168 2017: [I] Calling user exit script mmSysMonGpfsStartup: event startup, Async command /usr/lpp/mmfs/bin/mmsysmoncontrol. Sat Sep 2 10:33:42.190 2017: [I] Calling user exit script mmSinceShutdownRoleChange: event startup, Async command /usr/lpp/mmfs/bin/mmsysmonc. Sat 2 Sep 10:33:42 BST 2017: [I] sendRasEventToMonitor: Successfully send a filesystem event to monitor Sat 2 Sep 10:33:42 BST 2017: [I] The Spectrum Scale monitoring service is already running. Pid=5134 Cheers Richard -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ckrafft at de.ibm.com Mon Sep 4 13:02:49 2017 From: ckrafft at de.ibm.com (Christoph Krafft) Date: Mon, 4 Sep 2017 12:02:49 +0000 Subject: [gpfsug-discuss] Looking for Use-Cases with Spectrum Scale / ESS with vRanger & VMware Message-ID: An HTML attachment was scrubbed... URL: From heiner.billich at psi.ch Mon Sep 4 17:48:20 2017 From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI)) Date: Mon, 4 Sep 2017 16:48:20 +0000 Subject: [gpfsug-discuss] Use AFM for migration of many small files Message-ID: <467FB293-D33B-46F4-BA1B-A5CB4D28DDE6@psi.ch> Hello, We use AFM prefetch to migrate data between two clusters (using NFS). This works fine with large files, say 1+GB. But we have millions of smaller files, about 1MB each. Here I see just ~150MB/s ? compare this to the 1000+MB/s we get for larger files. I assume that we would need more parallelism, does prefetch pull just one file at a time? So each file needs some or many metadata operations plus a single or just a few read and writes. Doing this sequentially adds up all the latencies of NFS+GPFS. This is my explanation. With larger files gpfs prefetch on home will help. Please can anybody comment: Is this right, does AFM prefetch handle one file at a time in a sequential manner? And is there any way to change this behavior? Or am I wrong and I need to look elsewhere to get better performance for prefetch of many smaller files? We will migrate several filesets in parallel, but still with individual filesets up to 350TB in size 150MB/s isn?t fun. Also just about 150 files/s seconds looks poor. The setup is quite new, hence there may be other places to look at. It?s all RHEL7 an spectrum scale 4.2.2-3 on the afm cache. Thank you, Heiner --, Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch From vpuvvada at in.ibm.com Tue Sep 5 15:27:21 2017 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Tue, 5 Sep 2017 19:57:21 +0530 Subject: [gpfsug-discuss] Use AFM for migration of many small files In-Reply-To: <467FB293-D33B-46F4-BA1B-A5CB4D28DDE6@psi.ch> References: <467FB293-D33B-46F4-BA1B-A5CB4D28DDE6@psi.ch> Message-ID: Which version of Spectrum Scale ? What is the fileset mode ? >We use AFM prefetch to migrate data between two clusters (using NFS). This works fine with large files, say 1+GB. But we have millions of smaller files, about 1MB each. Here >I see just ~150MB/s ? compare this to the 1000+MB/s we get for larger files. How was the performance measured ? If parallel IO is enabled, AFM uses multiple gateway nodes to prefetch the large files (if file size if more than 1GB). Performance difference between small and lager file is huge (1000MB - 150MB = 850MB) here, and generally it is not the case. How many files were present in list file for prefetch ? Could you also share full internaldump from the gateway node ? >I assume that we would need more parallelism, does prefetch pull just one file at a time? So each file needs some or many metadata operations plus a single or just a few >read and writes. Doing this sequentially adds up all the latencies of NFS+GPFS. This is my explanation. With larger files gpfs prefetch on home will help. AFM prefetches the files on multiple threads. Default flush threads for prefetch are 36 (fileset.afmNumFlushThreads (default 4) + afmNumIOFlushThreads (default 32)). >Please can anybody comment: Is this right, does AFM prefetch handle one file at a time in a sequential manner? And is there any way to change this behavior? Or am I wrong and >I need to look elsewhere to get better performance for prefetch of many smaller files? See above, AFM reads files on multiple threads parallelly. Try increasing the afmNumFlushThreads on fileset and verify if it improves the performance. ~Venkat (vpuvvada at in.ibm.com) From: "Billich Heinrich Rainer (PSI)" To: "gpfsug-discuss at spectrumscale.org" Date: 09/04/2017 10:18 PM Subject: [gpfsug-discuss] Use AFM for migration of many small files Sent by: gpfsug-discuss-bounces at spectrumscale.org Hello, We use AFM prefetch to migrate data between two clusters (using NFS). This works fine with large files, say 1+GB. But we have millions of smaller files, about 1MB each. Here I see just ~150MB/s ? compare this to the 1000+MB/s we get for larger files. I assume that we would need more parallelism, does prefetch pull just one file at a time? So each file needs some or many metadata operations plus a single or just a few read and writes. Doing this sequentially adds up all the latencies of NFS+GPFS. This is my explanation. With larger files gpfs prefetch on home will help. Please can anybody comment: Is this right, does AFM prefetch handle one file at a time in a sequential manner? And is there any way to change this behavior? Or am I wrong and I need to look elsewhere to get better performance for prefetch of many smaller files? We will migrate several filesets in parallel, but still with individual filesets up to 350TB in size 150MB/s isn?t fun. Also just about 150 files/s seconds looks poor. The setup is quite new, hence there may be other places to look at. It?s all RHEL7 an spectrum scale 4.2.2-3 on the afm cache. Thank you, Heiner --, Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://urldefense.proofpoint.com/v2/url?u=https-3A__www.psi.ch&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=4y79Y-3M5sHV1Fm6aUFPEDIl8W5jxVP64XSlBsAYBb4&s=eHcVdovN10-m-Qk0Ln2qvol3pkKNFwrzz2wgf1zXVXE&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=4y79Y-3M5sHV1Fm6aUFPEDIl8W5jxVP64XSlBsAYBb4&s=LbRyuSM_djs0FDXr27hPottQHAn3OGcivpyRcIDBN3U&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From kenneth.waegeman at ugent.be Wed Sep 6 12:55:20 2017 From: kenneth.waegeman at ugent.be (Kenneth Waegeman) Date: Wed, 6 Sep 2017 13:55:20 +0200 Subject: [gpfsug-discuss] Change to default for verbsRdmaMinBytes? In-Reply-To: References: <20170901165625.6e4edd4c@osc.edu> Message-ID: Hi Sven, I see two parameters that we have set to non-default values that are not in your list of options still to configure. verbsRdmasPerConnection (256) and socketMaxListenConnections (1024) I remember we had to set socketMaxListenConnections because our cluster consist of +550 nodes. Are these settings still needed, or is this also tackled in the code? Thank you!! Cheers, Kenneth On 02/09/17 00:42, Sven Oehme wrote: > Hi Ed, > > yes the defaults for that have changed for customers who had not > overridden the default settings. the reason we did this was that many > systems in the field including all ESS systems that come pre-tuned > where manually changed to 8k from the 16k default due to better > performance that was confirmed in multiple customer engagements and > tests with various settings , therefore we change the default to what > it should be in the field so people are not bothered to set it anymore > (simplification) or get benefits by changing the default to provides > better performance. > all this happened when we did the communication code overhaul that did > lead to significant (think factors) of improved RPC performance for > RDMA and VERBS workloads. > there is another round of significant enhancements coming soon , that > will make even more parameters either obsolete or change some of the > defaults for better out of the box performance. > i see that we should probably enhance the communication of this > changes, not that i think this will have any negative effect compared > to what your performance was with the old setting i am actually pretty > confident that you get better performance with the new code, but by > setting parameters back to default on most 'manual tuned' probably > makes your system even faster. > if you have a Scale Client on 4.2.3+ you really shouldn't have > anything set beside maxfilestocache, pagepool, workerthreads and > potential prefetch , if you are a protocol node, this and settings > specific to an export (e.g. SMB, NFS set some special settings) , > pretty much everything else these days should be set to default so the > code can pick the correct parameters., if its not and you get better > performance by manual tweaking something i like to hear about it. > on the communication side in the next release will eliminate another > set of parameters that are now 'auto set' and we plan to work on NSD > next. > i presented various slides about the communication and simplicity > changes in various forums, latest public non NDA slides i presented > are here --> > http://files.gpfsug.org/presentations/2017/Manchester/08_Research_Topics.pdf > > hope this helps . > > Sven > > > > On Fri, Sep 1, 2017 at 1:56 PM Edward Wahl > wrote: > > Howdy. Just noticed this change to min RDMA packet size and I > don't seem to > see it in any patch notes. Maybe I just skipped the one where > this changed? > > mmlsconfig verbsRdmaMinBytes > verbsRdmaMinBytes 16384 > > (in case someone thinks we changed it) > > [root at proj-nsd01 ~]# mmlsconfig |grep verbs > verbsRdma enable > verbsRdma disable > verbsRdmasPerConnection 14 > verbsRdmasPerNode 1024 > verbsPorts mlx5_3/1 > verbsPorts mlx4_0 > verbsPorts mlx5_0 > verbsPorts mlx5_0 mlx5_1 > verbsPorts mlx4_1/1 > verbsPorts mlx4_1/2 > > > Oddly I also see this in config, though I've seen these kinds of > things before. > mmdiag --config |grep verbsRdmaMinBytes > verbsRdmaMinBytes 8192 > > We're on a recent efix. > Current GPFS build: "4.2.2.3 efix21 (1028007)". > > -- > > Ed Wahl > Ohio Supercomputer Center > 614-292-9302 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Wed Sep 6 13:22:41 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Wed, 6 Sep 2017 14:22:41 +0200 Subject: [gpfsug-discuss] Change to default for verbsRdmaMinBytes? In-Reply-To: References: <20170901165625.6e4edd4c@osc.edu> Message-ID: An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Sep 6 13:29:44 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 6 Sep 2017 12:29:44 +0000 Subject: [gpfsug-discuss] Save the date! GPFS-UG meeting at SC17 - Sunday November 12th Message-ID: <7838054B-8A46-46A0-8A53-81E3049B4AE7@nuance.com> The 2017 Supercomputing conference is only 2 months away, and here?s a reminder to come early and attend the GPFS user group meeting. The meeting is tentatively scheduled from the afternoon of Sunday, November 12th. Exact location and times are still being discussed. If you have an interest in presenting at the user group meeting, please let us know. More details in the coming weeks. Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From damir.krstic at gmail.com Wed Sep 6 13:35:45 2017 From: damir.krstic at gmail.com (Damir Krstic) Date: Wed, 06 Sep 2017 12:35:45 +0000 Subject: [gpfsug-discuss] filesets inside of filesets Message-ID: Today we have following fileset structure on our filesystem: /projects <-- gpfs filesystem /projects/b1000 <-- b1000 is a fileset with a fileset quota applied to it I need to create a fileset or a directory inside of this project and have separate quota applied to it e.g.: /projects/b1000 (b1000 has 10TB quota applied) /projects/b1000/backup (backup has 1TB quota applied) Is this possible? I am thinking nested filesets would work if GPFS supports that. Otherwise, I was going to create a separate filesystem, create corresponding backup filesets on it and symlink them to the /projects/ directory. Thanks in advance. Damir -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Sep 6 13:43:09 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Wed, 6 Sep 2017 12:43:09 +0000 Subject: [gpfsug-discuss] filesets inside of filesets In-Reply-To: References: Message-ID: Filesets in filesets are fine. BUT if you use scoped backups with TSM... Er Spectrum Protect, then there are restrictions on creating an IFS inside an IFS ... Simon From: > on behalf of "damir.krstic at gmail.com" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Wednesday, 6 September 2017 at 13:35 To: "gpfsug-discuss at spectrumscale.org" > Subject: [gpfsug-discuss] filesets inside of filesets Today we have following fileset structure on our filesystem: /projects <-- gpfs filesystem /projects/b1000 <-- b1000 is a fileset with a fileset quota applied to it I need to create a fileset or a directory inside of this project and have separate quota applied to it e.g.: /projects/b1000 (b1000 has 10TB quota applied) /projects/b1000/backup (backup has 1TB quota applied) Is this possible? I am thinking nested filesets would work if GPFS supports that. Otherwise, I was going to create a separate filesystem, create corresponding backup filesets on it and symlink them to the /projects/ directory. Thanks in advance. Damir -------------- next part -------------- An HTML attachment was scrubbed... URL: From rohwedder at de.ibm.com Wed Sep 6 13:51:47 2017 From: rohwedder at de.ibm.com (Markus Rohwedder) Date: Wed, 6 Sep 2017 14:51:47 +0200 Subject: [gpfsug-discuss] filesets inside of filesets In-Reply-To: References: Message-ID: Hello Damir, the files that belong to your fileset "backup" has a separate quota, it is not related to the quota in "b1000". There is no cumulative quota. Fileset Nesting may need other considerations as well, in some cases filesets behave different than simple directories. -> For NFSV4 ACLs, inheritance stops at the fileset boundaries -> Snapshots include the independent parent and the dependent children. Nested independent filesets are not included in a fileset snapshot. -> Export protocols like NFS or SMB will cross fileset boundaries and just treat them like a directory. Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development Phone: +49 7034 6430190 IBM Deutschland E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Martina K?deritz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: Damir Krstic To: gpfsug main discussion list Date: 09/06/2017 02:36 PM Subject: [gpfsug-discuss] filesets inside of filesets Sent by: gpfsug-discuss-bounces at spectrumscale.org Today we have following fileset structure on our filesystem: /projects <-- gpfs filesystem /projects/b1000 <-- b1000 is a fileset with a fileset quota applied to it I need to create a fileset or a directory inside of this project and have separate quota applied to it e.g.: /projects/b1000 (b1000 has 10TB quota applied) /projects/b1000/backup (backup has 1TB quota applied) Is this possible? I am thinking nested filesets would work if GPFS supports that. Otherwise, I was going to create a separate filesystem, create corresponding backup filesets on it and symlink them to the /projects/ directory. Thanks in advance. Damir_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=5jyA3TazAAOckIeQUeIG0CJ4TG0aMWv7jDLDk3gYNkE&s=CbzPKTgh7mO6om2LTQr94LM1qfshrEdm58cJydejAfE&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1B378274.gif Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From oehmes at gmail.com Wed Sep 6 14:32:40 2017 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 06 Sep 2017 13:32:40 +0000 Subject: [gpfsug-discuss] Change to default for verbsRdmaMinBytes? In-Reply-To: References: <20170901165625.6e4edd4c@osc.edu> Message-ID: Hi, you still need both of them, but they are both on the list to be removed, the first is already integrated for the next major release, the 2nd we still work on. Sven On Wed, Sep 6, 2017 at 4:55 AM Kenneth Waegeman wrote: > Hi Sven, > > I see two parameters that we have set to non-default values that are not > in your list of options still to configure. > verbsRdmasPerConnection (256) and > socketMaxListenConnections (1024) > > I remember we had to set socketMaxListenConnections because our cluster > consist of +550 nodes. > > Are these settings still needed, or is this also tackled in the code? > > Thank you!! > > Cheers, > Kenneth > > > > On 02/09/17 00:42, Sven Oehme wrote: > > Hi Ed, > > yes the defaults for that have changed for customers who had not > overridden the default settings. the reason we did this was that many > systems in the field including all ESS systems that come pre-tuned where > manually changed to 8k from the 16k default due to better performance that > was confirmed in multiple customer engagements and tests with various > settings , therefore we change the default to what it should be in the > field so people are not bothered to set it anymore (simplification) or get > benefits by changing the default to provides better performance. > all this happened when we did the communication code overhaul that did > lead to significant (think factors) of improved RPC performance for RDMA > and VERBS workloads. > there is another round of significant enhancements coming soon , that will > make even more parameters either obsolete or change some of the defaults > for better out of the box performance. > i see that we should probably enhance the communication of this changes, > not that i think this will have any negative effect compared to what your > performance was with the old setting i am actually pretty confident that > you get better performance with the new code, but by setting parameters > back to default on most 'manual tuned' probably makes your system even > faster. > if you have a Scale Client on 4.2.3+ you really shouldn't have anything > set beside maxfilestocache, pagepool, workerthreads and potential prefetch > , if you are a protocol node, this and settings specific to an export > (e.g. SMB, NFS set some special settings) , pretty much everything else > these days should be set to default so the code can pick the correct > parameters., if its not and you get better performance by manual tweaking > something i like to hear about it. > on the communication side in the next release will eliminate another set > of parameters that are now 'auto set' and we plan to work on NSD next. > i presented various slides about the communication and simplicity changes > in various forums, latest public non NDA slides i presented are here --> > http://files.gpfsug.org/presentations/2017/Manchester/08_Research_Topics.pdf > > hope this helps . > > Sven > > > > On Fri, Sep 1, 2017 at 1:56 PM Edward Wahl < ewahl at osc.edu> > wrote: > >> Howdy. Just noticed this change to min RDMA packet size and I don't >> seem to >> see it in any patch notes. Maybe I just skipped the one where this >> changed? >> >> mmlsconfig verbsRdmaMinBytes >> verbsRdmaMinBytes 16384 >> >> (in case someone thinks we changed it) >> >> [root at proj-nsd01 ~]# mmlsconfig |grep verbs >> verbsRdma enable >> verbsRdma disable >> verbsRdmasPerConnection 14 >> verbsRdmasPerNode 1024 >> verbsPorts mlx5_3/1 >> verbsPorts mlx4_0 >> verbsPorts mlx5_0 >> verbsPorts mlx5_0 mlx5_1 >> verbsPorts mlx4_1/1 >> verbsPorts mlx4_1/2 >> >> >> Oddly I also see this in config, though I've seen these kinds of things >> before. >> mmdiag --config |grep verbsRdmaMinBytes >> verbsRdmaMinBytes 8192 >> >> We're on a recent efix. >> Current GPFS build: "4.2.2.3 efix21 (1028007)". >> >> -- >> >> Ed Wahl >> Ohio Supercomputer Center >> 614-292-9302 <%28614%29%20292-9302> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From heiner.billich at psi.ch Wed Sep 6 17:16:18 2017 From: heiner.billich at psi.ch (Billich Heinrich Rainer (PSI)) Date: Wed, 6 Sep 2017 16:16:18 +0000 Subject: [gpfsug-discuss] Use AFM for migration of many small files Message-ID: <7D6EFD03-5D74-4A7B-A0E8-2AD41B050E15@psi.ch> Hello Venkateswara, Edward, Thank you for the comments on how to speed up AFM prefetch with small files. We run 4.2.2-3 and the AFM mode is RO and we have just a single gateway, i.e. no parallel reads for large files. We will try to increase the value of afmNumFlushThreads. It wasn?t clear to me that these threads do read from home, too - at least for prefetch. First I will try a plain NFS mount and see how parallel reads of many small files scale the throughput. Next I will try AFM prefetch. I don?t do nice benchmarking, just watching dstat output. We prefetch 100?000 files in one bunch, so there is ample time to observe. The basic issue is that we get just about 45MB/s for sequential read of many 1000 files with 1MB per file on the home cluster. I.e. we read one file at a time before we switch to the next. This is no surprise. Each read takes about 20ms to complete, so at max we get 50 reads of 1MB per second. We?ve seen this on classical raid storage and on DSS/ESS systems. It?s likely just the physics of spinning disks and the fact that we do one read at a time and don?t allow any parallelism. We wait for one or two I/Os to single disks to complete before we continue With larger files prefetch jumps in and fires many reads in parallel ? To get 1?000MB/s I need to do 1?000 read/s and need to have ~20 reads in progress in parallel all the time ? we?ll see how close we get to 1?000MB/s with ?many small files?. Kind regards, Heiner -- Paul Scherrer Institut Science IT Heiner Billich WHGA 106 CH 5232 Villigen PSI 056 310 36 02 https://www.psi.ch From stijn.deweirdt at ugent.be Wed Sep 6 18:13:48 2017 From: stijn.deweirdt at ugent.be (Stijn De Weirdt) Date: Wed, 6 Sep 2017 19:13:48 +0200 Subject: [gpfsug-discuss] mixed verbsRdmaSend Message-ID: hi all, what is the expected behaviour of a mixed verbsRdmaSend setup: some nodes enabled, most disabled. we have some nodes that have a very high iops workload, but most of the cluster of 500+ nodes do not have such usecase. we enabled verbsRdmaSend on the managers/quorum nodes (<10) and on the few (<10) clients with this workload, but not on the others (500+). it seems to work out fine, but is this acceptable as config? (the docs mention that enabling verbsrdamSend on a> 100 nodes might lead to errors). the nodes use ipoib as ip network, and running with verbsRdmaSend disabled on all nodes leads to unstable cluster (TX errors (<1 error in 1M packets) on some clients leading to gpfs expel nodes etc). (we still need to open a case wil mellanox to investigate further) many thanks, stijn From gcorneau at us.ibm.com Thu Sep 7 00:30:23 2017 From: gcorneau at us.ibm.com (Glen Corneau) Date: Wed, 6 Sep 2017 18:30:23 -0500 Subject: [gpfsug-discuss] Happy 20th birthday GPFS !! Message-ID: Sorry I missed the anniversary of your conception (announcement letter) back on August 26th, so I hope you'll accept my belated congratulations on this long and exciting journey! https://www-01.ibm.com/common/ssi/cgi-bin/ssialias?subtype=ca&infotype=an&appname=iSource&supplier=897&letternum=ENUS297-318 I remember your parent, PIOFS, as well! Ahh the fun times. ------------------ Glen Corneau Power Systems Washington Systems Center gcorneau at us.ibm.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/jpeg Size: 26117 bytes Desc: not available URL: From xhejtman at ics.muni.cz Thu Sep 7 16:07:20 2017 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Thu, 7 Sep 2017 17:07:20 +0200 Subject: [gpfsug-discuss] Overwritting migrated files Message-ID: <20170907150720.h3t5fowvdlibvik4@ics.muni.cz> Hello, we have files about 100GB per file. Many of these files are migrated to tapes. (GPFS+TSM, tape storage is external pool and dsmmigrate, dsmrecall are in place). These files are images from bacula backup system. When bacula wants to reuse some of images, it needs to truncate the file to 64kB and overwrite it. Is there a way not to recall whole 100GB from tapes for only to truncate the file? I tried to do partial recall: dsmrecall -D -size=65k Vol03797 after recall processing finished, I tried to truncate the file using: dd if=/dev/zero of=Vol03797 count=0 bs=64k seek=1 which caused futher recall of the whole file: $ dsmls Vol03797 IBM Spectrum Protect Command Line Space Management Client Interface Client Version 8, Release 1, Level 2.0 Client date/time: 09/07/2017 15:01:59 (c) Copyright by IBM Corporation and other(s) 1990, 2017. All Rights Reserved. ActS ResS ResB FSt FName 107380819676 10485760 31373312 m (p) Vol03797 and ResB size has been growing to 107380819676. After dd finished: dsmls Vol03797 IBM Spectrum Protect Command Line Space Management Client Interface Client Version 8, Release 1, Level 2.0 Client date/time: 09/07/2017 15:08:03 (c) Copyright by IBM Corporation and other(s) 1990, 2017. All Rights Reserved. ActS ResS ResB FSt FName 65536 65536 64 r Vol03797 Is there another way to truncate the file and drop whole migrated part? -- Luk?? Hejtm?nek From john.hearns at asml.com Thu Sep 7 16:15:00 2017 From: john.hearns at asml.com (John Hearns) Date: Thu, 7 Sep 2017 15:15:00 +0000 Subject: [gpfsug-discuss] AFM from generic NFS share - mmafmconfig Message-ID: If I have an AFM setup where the home is located on a generic NFS share, let's say server:/volume/share When I come ot set this up does it make sense to run mmafmconfig on /volume/share ? I can mount the share as a plain old NFS mount in order to run this operation, before I create the cache side fileset. Apologies if I am being dumb as an ox here, and I deserve to be slapped in the face with a wet fish https://en.wikipedia.org/wiki/The_Fish-Slapping_Dance -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From neil.wilson at metoffice.gov.uk Thu Sep 7 16:33:58 2017 From: neil.wilson at metoffice.gov.uk (Wilson, Neil) Date: Thu, 7 Sep 2017 15:33:58 +0000 Subject: [gpfsug-discuss] AFM from generic NFS share - mmafmconfig In-Reply-To: References: Message-ID: I think you need to configure a gateway node (use mmchnode to change an existing node class to gateway) Then use mmafmconfig to setup export server maps on the gateway node. e.g. mmafmconfig -add "mapping1" -export-map "nfsServerIP"/"GatewayNode" (double quotes not required) mafmconfig show all Map name: mapping1 Export server map: IP/GatewayNode From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of John Hearns Sent: 07 September 2017 16:15 To: gpfsug main discussion list Subject: [gpfsug-discuss] AFM from generic NFS share - mmafmconfig If I have an AFM setup where the home is located on a generic NFS share, let's say server:/volume/share When I come ot set this up does it make sense to run mmafmconfig on /volume/share ? I can mount the share as a plain old NFS mount in order to run this operation, before I create the cache side fileset. Apologies if I am being dumb as an ox here, and I deserve to be slapped in the face with a wet fish https://en.wikipedia.org/wiki/The_Fish-Slapping_Dance -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.hearns at asml.com Thu Sep 7 16:52:19 2017 From: john.hearns at asml.com (John Hearns) Date: Thu, 7 Sep 2017 15:52:19 +0000 Subject: [gpfsug-discuss] Deletion of a pcache snapshot? Message-ID: Firmly lining myself up for a smack round the chops with a wet haddock... I try to delete an AFM cache fileset which I create da few days ago (it has an NFS home) Mmdelfileset responds that : Fileset obfuscated has 1 fileset snapshot(s). When I try to delete the snapshot: Snapshot obfuscated.afm.1234 is an internal pcache recovery snapshot and cannot be deleted by user. I find this reference, which is about as useful as a wet haddock: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm The advice of the gallery is sought, please. -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From janusz.malka at desy.de Thu Sep 7 20:23:36 2017 From: janusz.malka at desy.de (Malka, Janusz) Date: Thu, 7 Sep 2017 21:23:36 +0200 (CEST) Subject: [gpfsug-discuss] Deletion of a pcache snapshot? In-Reply-To: References: Message-ID: <950421843.15715748.1504812216286.JavaMail.zimbra@desy.de> I had similar issue, I had to recover connection to home From: "John Hearns" To: "gpfsug main discussion list" Sent: Thursday, 7 September, 2017 17:52:19 Subject: [gpfsug-discuss] Deletion of a pcache snapshot? Firmly lining myself up for a smack round the chops with a wet haddock? I try to delete an AFM cache fileset which I create da few days ago (it has an NFS home) Mmdelfileset responds that : Fileset obfuscated has 1 fileset snapshot(s). When I try to delete the snapshot: Snapshot obfuscated.afm.1234 is an internal pcache recovery snapshot and cannot be deleted by user. I find this reference, which is about as useful as a wet haddock: [ https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm | https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm ] The advice of the gallery is sought, please. -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From christof.schmitt at us.ibm.com Thu Sep 7 22:16:34 2017 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Thu, 7 Sep 2017 21:16:34 +0000 Subject: [gpfsug-discuss] SMB2 leases - oplocks - growing files In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Fri Sep 8 03:11:48 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Thu, 7 Sep 2017 22:11:48 -0400 Subject: [gpfsug-discuss] mmfsd write behavior Message-ID: <0f61621f-84d9-e249-0dd7-c1a4d50fea86@nasa.gov> Hi Everyone, This is something that's come up in the past and has recently resurfaced with a project I've been working on, and that is-- it seems to me as though mmfsd never attempts to flush the cache of the block devices its writing to (looking at blktrace output seems to confirm this). Is this actually the case? I've looked at the gpl headers for linux and I don't see any sign of blkdev_fsync, blkdev_issue_flush, WRITE_FLUSH, or REQ_FLUSH. I'm sure there's other ways to trigger this behavior that GPFS may very well be using that I've missed. That's why I'm asking :) I figure with FPO being pushed as an HDFS replacement using commodity drives this feature has *got* to be in the code somewhere. -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From oehmes at gmail.com Fri Sep 8 03:55:14 2017 From: oehmes at gmail.com (Sven Oehme) Date: Fri, 08 Sep 2017 02:55:14 +0000 Subject: [gpfsug-discuss] mmfsd write behavior In-Reply-To: <0f61621f-84d9-e249-0dd7-c1a4d50fea86@nasa.gov> References: <0f61621f-84d9-e249-0dd7-c1a4d50fea86@nasa.gov> Message-ID: I am not sure what exactly you are looking for but all blockdevices are opened with O_DIRECT , we never cache anything on this layer . On Thu, Sep 7, 2017, 7:11 PM Aaron Knister wrote: > Hi Everyone, > > This is something that's come up in the past and has recently resurfaced > with a project I've been working on, and that is-- it seems to me as > though mmfsd never attempts to flush the cache of the block devices its > writing to (looking at blktrace output seems to confirm this). Is this > actually the case? I've looked at the gpl headers for linux and I don't > see any sign of blkdev_fsync, blkdev_issue_flush, WRITE_FLUSH, or > REQ_FLUSH. I'm sure there's other ways to trigger this behavior that > GPFS may very well be using that I've missed. That's why I'm asking :) > > I figure with FPO being pushed as an HDFS replacement using commodity > drives this feature has *got* to be in the code somewhere. > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Fri Sep 8 04:05:42 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Thu, 7 Sep 2017 23:05:42 -0400 Subject: [gpfsug-discuss] mmfsd write behavior In-Reply-To: References: <0f61621f-84d9-e249-0dd7-c1a4d50fea86@nasa.gov> Message-ID: Thanks Sven. I didn't think GPFS itself was caching anything on that layer, but it's my understanding that O_DIRECT isn't sufficient to force I/O to be flushed (e.g. the device itself might have a volatile caching layer). Take someone using ZFS zvol's as NSDs. I can write() all day log to that zvol (even with O_DIRECT) but there is absolutely no guarantee those writes have been committed to stable storage and aren't just sitting in RAM until an fsync() occurs (or some other bio function that causes a flush). I also don't believe writing to a SATA drive with O_DIRECT will force cache flushes of the drive's writeback cache.. although I just tested that one and it seems to actually trigger a scsi cache sync. Interesting. -Aaron On 9/7/17 10:55 PM, Sven Oehme wrote: > I am not sure what exactly you are looking for but all blockdevices are > opened with O_DIRECT , we never cache anything on this layer . > > > On Thu, Sep 7, 2017, 7:11 PM Aaron Knister > wrote: > > Hi Everyone, > > This is something that's come up in the past and has recently resurfaced > with a project I've been working on, and that is-- it seems to me as > though mmfsd never attempts to flush the cache of the block devices its > writing to (looking at blktrace output seems to confirm this). Is this > actually the case? I've looked at the gpl headers for linux and I don't > see any sign of blkdev_fsync, blkdev_issue_flush, WRITE_FLUSH, or > REQ_FLUSH. I'm sure there's other ways to trigger this behavior that > GPFS may very well be using that I've missed. That's why I'm asking :) > > I figure with FPO being pushed as an HDFS replacement using commodity > drives this feature has *got* to be in the code somewhere. > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From aaron.s.knister at nasa.gov Fri Sep 8 04:26:02 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Thu, 7 Sep 2017 23:26:02 -0400 Subject: [gpfsug-discuss] Happy 20th birthday GPFS !! In-Reply-To: References: Message-ID: <4a9feeb2-bb9d-8c9a-e506-926d8537cada@nasa.gov> Sounds like celebratory cake is in order for the users group in a few weeks ;) On 9/6/17 7:30 PM, Glen Corneau wrote: > Sorry I missed the anniversary of your conception ?(announcement letter) > back on August 26th, so I hope you'll accept my belated congratulations > on this long and exciting journey! > > https://www-01.ibm.com/common/ssi/cgi-bin/ssialias?subtype=ca&infotype=an&appname=iSource&supplier=897&letternum=ENUS297-318 > > I remember your parent, PIOFS, as well! ?Ahh the fun times. > ------------------ > Glen Corneau > Power Systems > Washington Systems Center > gcorneau at us.ibm.com > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From vpuvvada at in.ibm.com Fri Sep 8 06:00:46 2017 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Fri, 8 Sep 2017 10:30:46 +0530 Subject: [gpfsug-discuss] Deletion of a pcache snapshot? In-Reply-To: <950421843.15715748.1504812216286.JavaMail.zimbra@desy.de> References: <950421843.15715748.1504812216286.JavaMail.zimbra@desy.de> Message-ID: Snapshots created by AFM (recovery, resync or peer snapshots) cannot be deleted by user using mmdelsnapshot command directly. After recovery or resync completion they get deleted automatically. For peer snapshots deletion mmpsnasp command is used. Which version of GPFS? Try with -p (undocumented) option. mmdelsnapshot device snapname -j fset -p ~Venkat (vpuvvada at in.ibm.com) From: "Malka, Janusz" To: gpfsug main discussion list Date: 09/08/2017 12:53 AM Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? Sent by: gpfsug-discuss-bounces at spectrumscale.org I had similar issue, I had to recover connection to home From: "John Hearns" To: "gpfsug main discussion list" Sent: Thursday, 7 September, 2017 17:52:19 Subject: [gpfsug-discuss] Deletion of a pcache snapshot? Firmly lining myself up for a smack round the chops with a wet haddock? I try to delete an AFM cache fileset which I create da few days ago (it has an NFS home) Mmdelfileset responds that : Fileset obfuscated has 1 fileset snapshot(s). When I try to delete the snapshot: Snapshot obfuscated.afm.1234 is an internal pcache recovery snapshot and cannot be deleted by user. I find this reference, which is about as useful as a wet haddock: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm The advice of the gallery is sought, please. -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=VX-Y-5GtodMzrpt1D71-1OOY3hu2UTJBg45sqxTHC8I&s=AmQf6iZeaanuNkB3Ys2lR8Ajk-n2TUJ6Wbt2z2pnbKI&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From vpuvvada at in.ibm.com Fri Sep 8 06:21:47 2017 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Fri, 8 Sep 2017 10:51:47 +0530 Subject: [gpfsug-discuss] AFM from generic NFS share - mmafmconfig In-Reply-To: References: Message-ID: mmafmconfig command should be run on the target path (path specified in the afmTarget option when fileset is created). If many filesets are sharing the same target (ex independent writer mode) , enable AFM once on target path. Run the command at home cluster. mmafmconifg enable afmTarget ~Venkat (vpuvvada at in.ibm.com) From: "Wilson, Neil" To: gpfsug main discussion list Date: 09/07/2017 09:04 PM Subject: Re: [gpfsug-discuss] AFM from generic NFS share - mmafmconfig Sent by: gpfsug-discuss-bounces at spectrumscale.org I think you need to configure a gateway node (use mmchnode to change an existing node class to gateway) Then use mmafmconfig to setup export server maps on the gateway node. e.g. mmafmconfig ?add ?mapping1? ?export-map ?nfsServerIP?/?GatewayNode? (double quotes not required) mafmconfig show all Map name: mapping1 Export server map: IP/GatewayNode From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of John Hearns Sent: 07 September 2017 16:15 To: gpfsug main discussion list Subject: [gpfsug-discuss] AFM from generic NFS share - mmafmconfig If I have an AFM setup where the home is located on a generic NFS share, let?s say server:/volume/share When I come ot set this up does it make sense to run mmafmconfig on /volume/share ? I can mount the share as a plain old NFS mount in order to run this operation, before I create the cache side fileset. Apologies if I am being dumb as an ox here, and I deserve to be slapped in the face with a wet fish https://en.wikipedia.org/wiki/The_Fish-Slapping_Dance -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=kKlSEJqmVE6q8Qt02JNaDLsewp13C0yRAmlfc_djRkk&s=JIbuXlCiReZx3ws5__6juuGC-sAqM74296BuyzgyNYg&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From gellis at ocf.co.uk Fri Sep 8 08:04:51 2017 From: gellis at ocf.co.uk (Georgina Ellis) Date: Fri, 8 Sep 2017 07:04:51 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 68, Issue 13 In-Reply-To: References: Message-ID: <0CBB283A-A0A9-4FC9-A1CD-9E019D74CDB9@ocf.co.uk> I am still populating your lot 2 response - it is split across 3 word docs and a whole heap of emails so easier for me to keep going - I dropped u off a lot of emails to save filling your inbox :-) Could you poke around other tenders for the portal question please? Sent from my iPhone > On 7 Sep 2017, at 22:16, "gpfsug-discuss-request at spectrumscale.org" wrote: > > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at spectrumscale.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at spectrumscale.org > > You can reach the person managing the list at > gpfsug-discuss-owner at spectrumscale.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Re: Deletion of a pcache snapshot? (Malka, Janusz) > 2. Re: SMB2 leases - oplocks - growing files (Christof Schmitt) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 7 Sep 2017 21:23:36 +0200 (CEST) > From: "Malka, Janusz" > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? > Message-ID: <950421843.15715748.1504812216286.JavaMail.zimbra at desy.de> > Content-Type: text/plain; charset="utf-8" > > I had similar issue, I had to recover connection to home > > > From: "John Hearns" > To: "gpfsug main discussion list" > Sent: Thursday, 7 September, 2017 17:52:19 > Subject: [gpfsug-discuss] Deletion of a pcache snapshot? > > > > Firmly lining myself up for a smack round the chops with a wet haddock? > > I try to delete an AFM cache fileset which I create da few days ago (it has an NFS home) > > Mmdelfileset responds that : > > Fileset obfuscated has 1 fileset snapshot(s). > > > > When I try to delete the snapshot: > > Snapshot obfuscated.afm.1234 is an internal pcache recovery snapshot and cannot be deleted by user. > > > > I find this reference, which is about as useful as a wet haddock: > > [ https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm | https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm ] > > > > The advice of the gallery is sought, please. > > > > > > > -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > Message: 2 > Date: Thu, 7 Sep 2017 21:16:34 +0000 > From: "Christof Schmitt" > To: gpfsug-discuss at spectrumscale.org > Cc: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] SMB2 leases - oplocks - growing files > Message-ID: > > > Content-Type: text/plain; charset="us-ascii" > > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 68, Issue 13 > ********************************************** From john.hearns at asml.com Fri Sep 8 08:26:01 2017 From: john.hearns at asml.com (John Hearns) Date: Fri, 8 Sep 2017 07:26:01 +0000 Subject: [gpfsug-discuss] Deletion of a pcache snapshot? In-Reply-To: References: <950421843.15715748.1504812216286.JavaMail.zimbra@desy.de> Message-ID: Venkat, thankyou. I have a support ticket open on this issue, and will keep this option handy! From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Venkateswara R Puvvada Sent: Friday, September 08, 2017 7:01 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? Snapshots created by AFM (recovery, resync or peer snapshots) cannot be deleted by user using mmdelsnapshot command directly. After recovery or resync completion they get deleted automatically. For peer snapshots deletion mmpsnasp command is used. Which version of GPFS? Try with -p (undocumented) option. mmdelsnapshot device snapname -j fset -p ~Venkat (vpuvvada at in.ibm.com) From: "Malka, Janusz" > To: gpfsug main discussion list > Date: 09/08/2017 12:53 AM Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I had similar issue, I had to recover connection to home ________________________________ From: "John Hearns" > To: "gpfsug main discussion list" > Sent: Thursday, 7 September, 2017 17:52:19 Subject: [gpfsug-discuss] Deletion of a pcache snapshot? Firmly lining myself up for a smack round the chops with a wet haddock? I try to delete an AFM cache fileset which I create da few days ago (it has an NFS home) Mmdelfileset responds that : Fileset obfuscated has 1 fileset snapshot(s). When I try to delete the snapshot: Snapshot obfuscated.afm.1234 is an internal pcache recovery snapshot and cannot be deleted by user. I find this reference, which is about as useful as a wet haddock: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm The advice of the gallery is sought, please. -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=VX-Y-5GtodMzrpt1D71-1OOY3hu2UTJBg45sqxTHC8I&s=AmQf6iZeaanuNkB3Ys2lR8Ajk-n2TUJ6Wbt2z2pnbKI&e= -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gellis at ocf.co.uk Fri Sep 8 08:33:51 2017 From: gellis at ocf.co.uk (Georgina Ellis) Date: Fri, 8 Sep 2017 07:33:51 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 68, Issue 13 In-Reply-To: References: Message-ID: <93DCF805-F703-4ED5-A079-A44992A9268C@ocf.co.uk> Apologies All, slip of the keyboard and not a comment on GPFS! Sent from my iPhone > On 7 Sep 2017, at 22:16, "gpfsug-discuss-request at spectrumscale.org" wrote: > > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at spectrumscale.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at spectrumscale.org > > You can reach the person managing the list at > gpfsug-discuss-owner at spectrumscale.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Re: Deletion of a pcache snapshot? (Malka, Janusz) > 2. Re: SMB2 leases - oplocks - growing files (Christof Schmitt) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 7 Sep 2017 21:23:36 +0200 (CEST) > From: "Malka, Janusz" > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? > Message-ID: <950421843.15715748.1504812216286.JavaMail.zimbra at desy.de> > Content-Type: text/plain; charset="utf-8" > > I had similar issue, I had to recover connection to home > > > From: "John Hearns" > To: "gpfsug main discussion list" > Sent: Thursday, 7 September, 2017 17:52:19 > Subject: [gpfsug-discuss] Deletion of a pcache snapshot? > > > > Firmly lining myself up for a smack round the chops with a wet haddock? > > I try to delete an AFM cache fileset which I create da few days ago (it has an NFS home) > > Mmdelfileset responds that : > > Fileset obfuscated has 1 fileset snapshot(s). > > > > When I try to delete the snapshot: > > Snapshot obfuscated.afm.1234 is an internal pcache recovery snapshot and cannot be deleted by user. > > > > I find this reference, which is about as useful as a wet haddock: > > [ https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm | https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm ] > > > > The advice of the gallery is sought, please. > > > > > > > -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > Message: 2 > Date: Thu, 7 Sep 2017 21:16:34 +0000 > From: "Christof Schmitt" > To: gpfsug-discuss at spectrumscale.org > Cc: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] SMB2 leases - oplocks - growing files > Message-ID: > > > Content-Type: text/plain; charset="us-ascii" > > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 68, Issue 13 > ********************************************** From Sandra.McLaughlin at astrazeneca.com Fri Sep 8 10:12:02 2017 From: Sandra.McLaughlin at astrazeneca.com (McLaughlin, Sandra M) Date: Fri, 8 Sep 2017 09:12:02 +0000 Subject: [gpfsug-discuss] Deletion of a pcache snapshot? In-Reply-To: References: <950421843.15715748.1504812216286.JavaMail.zimbra@desy.de> Message-ID: John, I had a period when I had to delete and remake AFM filesets rather frequently ? this always worked for me: mmunlinkfileset device fset -f mmdelsnapshot device snapname -j fset mmdelfileset device fset -f Sandra From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of John Hearns Sent: 08 September 2017 08:26 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? Venkat, thankyou. I have a support ticket open on this issue, and will keep this option handy! From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Venkateswara R Puvvada Sent: Friday, September 08, 2017 7:01 AM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? Snapshots created by AFM (recovery, resync or peer snapshots) cannot be deleted by user using mmdelsnapshot command directly. After recovery or resync completion they get deleted automatically. For peer snapshots deletion mmpsnasp command is used. Which version of GPFS? Try with -p (undocumented) option. mmdelsnapshot device snapname -j fset -p ~Venkat (vpuvvada at in.ibm.com) From: "Malka, Janusz" > To: gpfsug main discussion list > Date: 09/08/2017 12:53 AM Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I had similar issue, I had to recover connection to home ________________________________ From: "John Hearns" > To: "gpfsug main discussion list" > Sent: Thursday, 7 September, 2017 17:52:19 Subject: [gpfsug-discuss] Deletion of a pcache snapshot? Firmly lining myself up for a smack round the chops with a wet haddock? I try to delete an AFM cache fileset which I create da few days ago (it has an NFS home) Mmdelfileset responds that : Fileset obfuscated has 1 fileset snapshot(s). When I try to delete the snapshot: Snapshot obfuscated.afm.1234 is an internal pcache recovery snapshot and cannot be deleted by user. I find this reference, which is about as useful as a wet haddock: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm The advice of the gallery is sought, please. -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=VX-Y-5GtodMzrpt1D71-1OOY3hu2UTJBg45sqxTHC8I&s=AmQf6iZeaanuNkB3Ys2lR8Ajk-n2TUJ6Wbt2z2pnbKI&e= -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. ________________________________ AstraZeneca UK Limited is a company incorporated in England and Wales with registered number:03674842 and its registered office at 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA. This e-mail and its attachments are intended for the above named recipient only and may contain confidential and privileged information. If they have come to you in error, you must not copy or show them to anyone; instead, please reply to this e-mail, highlighting the error to the sender and then immediately delete the message. For information about how AstraZeneca UK Limited and its affiliates may process information, personal data and monitor communications, please see our privacy notice at www.astrazeneca.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.hearns at asml.com Fri Sep 8 11:57:14 2017 From: john.hearns at asml.com (John Hearns) Date: Fri, 8 Sep 2017 10:57:14 +0000 Subject: [gpfsug-discuss] Deletion of a pcache snapshot? In-Reply-To: References: <950421843.15715748.1504812216286.JavaMail.zimbra@desy.de> Message-ID: Sandra, Thankyou for the help. I have a support ticket outstanding, and will see what is suggested. I am sure this is a simple matter of deleting the fileset as you say! From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of McLaughlin, Sandra M Sent: Friday, September 08, 2017 11:12 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? John, I had a period when I had to delete and remake AFM filesets rather frequently ? this always worked for me: mmunlinkfileset device fset -f mmdelsnapshot device snapname -j fset mmdelfileset device fset -f Sandra From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of John Hearns Sent: 08 September 2017 08:26 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? Venkat, thankyou. I have a support ticket open on this issue, and will keep this option handy! From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Venkateswara R Puvvada Sent: Friday, September 08, 2017 7:01 AM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? Snapshots created by AFM (recovery, resync or peer snapshots) cannot be deleted by user using mmdelsnapshot command directly. After recovery or resync completion they get deleted automatically. For peer snapshots deletion mmpsnasp command is used. Which version of GPFS? Try with -p (undocumented) option. mmdelsnapshot device snapname -j fset -p ~Venkat (vpuvvada at in.ibm.com) From: "Malka, Janusz" > To: gpfsug main discussion list > Date: 09/08/2017 12:53 AM Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I had similar issue, I had to recover connection to home ________________________________ From: "John Hearns" > To: "gpfsug main discussion list" > Sent: Thursday, 7 September, 2017 17:52:19 Subject: [gpfsug-discuss] Deletion of a pcache snapshot? Firmly lining myself up for a smack round the chops with a wet haddock? I try to delete an AFM cache fileset which I create da few days ago (it has an NFS home) Mmdelfileset responds that : Fileset obfuscated has 1 fileset snapshot(s). When I try to delete the snapshot: Snapshot obfuscated.afm.1234 is an internal pcache recovery snapshot and cannot be deleted by user. I find this reference, which is about as useful as a wet haddock: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm The advice of the gallery is sought, please. -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=VX-Y-5GtodMzrpt1D71-1OOY3hu2UTJBg45sqxTHC8I&s=AmQf6iZeaanuNkB3Ys2lR8Ajk-n2TUJ6Wbt2z2pnbKI&e= -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. ________________________________ AstraZeneca UK Limited is a company incorporated in England and Wales with registered number:03674842 and its registered office at 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA. This e-mail and its attachments are intended for the above named recipient only and may contain confidential and privileged information. If they have come to you in error, you must not copy or show them to anyone; instead, please reply to this e-mail, highlighting the error to the sender and then immediately delete the message. For information about how AstraZeneca UK Limited and its affiliates may process information, personal data and monitor communications, please see our privacy notice at www.astrazeneca.com -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Fri Sep 8 11:58:05 2017 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Fri, 8 Sep 2017 03:58:05 -0700 Subject: [gpfsug-discuss] Hold the Date - Spectrum Scale Day @ HPCXXL (Sept 2017, NYC) In-Reply-To: <6EF4187F-D8A1-4927-9E4F-4DF703DA04F5@lbl.gov> References: <6EF4187F-D8A1-4927-9E4F-4DF703DA04F5@lbl.gov> Message-ID: Hello, The agenda for the GPFS Day during HPCXXL is fairly fleshed out here: http://hpcxxl.org/summer-2017-meeting-september-24-29-new-york-city/ See notes on registration below, which is free but required. Use the HPCXXL registration form, which has a $0 GPFS Day registration option. Hope to see some of you there. Best, Kristy > On Aug 21, 2017, at 3:33 PM, Kristy Kallback-Rose wrote: > > If you plan on attending the GPFS Day, please use the HPCXXL registration form (link to Eventbrite registration at the link below). The GPFS day is a free event, but you *must* register so we can make sure there are enough seats and food available. > > If you would like to speak or suggest a topic, please let me know. > > http://hpcxxl.org/summer-2017-meeting-september-24-29-new-york-city/ > > The agenda is still being worked on, here are some likely topics: > > --RoadMap/Updates > --"New features - New Bugs? (Julich) > --GPFS + Openstack (CSCS) > --ORNL Update on Spider3-related GPFS work > --ANL Site Update > --File Corruption Session > > Best, > Kristy > >> On Aug 8, 2017, at 11:33 AM, Kristy Kallback-Rose > wrote: >> >> Hello, >> >> The GPFS Day of the HPCXXL conference is confirmed for Thursday, September 28th. Here is an updated URL, the agenda and registration are still being put together http://hpcxxl.org/summer-2017-meeting-september-24-29-new-york-city/ . The GPFS Day will require registration, so we can make sure there is enough room (and coffee/food) for all attendees ?however, there will be no registration fee if you attend the GPFS Day only. >> >> I?ll send another update when the agenda is closer to settled. >> >> Cheers, >> Kristy >> >>> On Jul 7, 2017, at 3:32 PM, Kristy Kallback-Rose > wrote: >>> >>> Hello, >>> >>> More details will be provided as they become available, but just so you can make a placeholder on your calendar, there will be a Spectrum Scale Day the week of September 25th - 29th, likely Thursday, September 28th. >>> >>> This will be a part of the larger HPCXXL meeting (https://www.spxxl.org/?q=New-York-City-2017 ). You may recall this group was formerly called SPXXL and the website is in the process of transitioning to the new name (and getting a new certificate). You will be able to attend *just* the Spectrum Scale day if that is the only portion of the event you would like to attend. >>> >>> The NYC location is great for Spectrum Scale events because many IBMers, including developers, can come in from Poughkeepsie. >>> >>> More as we get closer to the date and details are settled. >>> >>> Cheers, >>> Kristy >> > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From hpc.ken.tw25qn at gmail.com Fri Sep 8 19:30:32 2017 From: hpc.ken.tw25qn at gmail.com (Ken Atkinson) Date: Fri, 8 Sep 2017 19:30:32 +0100 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 68, Issue 13 In-Reply-To: References: <93DCF805-F703-4ED5-A079-A44992A9268C@ocf.co.uk> Message-ID: Not on too many G&Ts Georgina? How are things. Ken Atkinson On 8 Sep 2017 08:33, "Georgina Ellis" wrote: Apologies All, slip of the keyboard and not a comment on GPFS! Sent from my iPhone > On 7 Sep 2017, at 22:16, "gpfsug-discuss-request at spectrumscale.org" wrote: > > Send gpfsug-discuss mailing list submissions to > gpfsug-discuss at spectrumscale.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > or, via email, send a message with subject or body 'help' to > gpfsug-discuss-request at spectrumscale.org > > You can reach the person managing the list at > gpfsug-discuss-owner at spectrumscale.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gpfsug-discuss digest..." > > > Today's Topics: > > 1. Re: Deletion of a pcache snapshot? (Malka, Janusz) > 2. Re: SMB2 leases - oplocks - growing files (Christof Schmitt) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 7 Sep 2017 21:23:36 +0200 (CEST) > From: "Malka, Janusz" > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? > Message-ID: <950421843.15715748.1504812216286.JavaMail.zimbra at desy.de> > Content-Type: text/plain; charset="utf-8" > > I had similar issue, I had to recover connection to home > > > From: "John Hearns" > To: "gpfsug main discussion list" > Sent: Thursday, 7 September, 2017 17:52:19 > Subject: [gpfsug-discuss] Deletion of a pcache snapshot? > > > > Firmly lining myself up for a smack round the chops with a wet haddock? > > I try to delete an AFM cache fileset which I create da few days ago (it has an NFS home) > > Mmdelfileset responds that : > > Fileset obfuscated has 1 fileset snapshot(s). > > > > When I try to delete the snapshot: > > Snapshot obfuscated.afm.1234 is an internal pcache recovery snapshot and cannot be deleted by user. > > > > I find this reference, which is about as useful as a wet haddock: > > [ https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2. 3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm | https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2. 3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm ] > > > > The advice of the gallery is sought, please. > > > > > > > -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > Message: 2 > Date: Thu, 7 Sep 2017 21:16:34 +0000 > From: "Christof Schmitt" > To: gpfsug-discuss at spectrumscale.org > Cc: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] SMB2 leases - oplocks - growing files > Message-ID: > > > Content-Type: text/plain; charset="us-ascii" > > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > End of gpfsug-discuss Digest, Vol 68, Issue 13 > ********************************************** _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Fri Sep 8 22:14:04 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Fri, 8 Sep 2017 17:14:04 -0400 Subject: [gpfsug-discuss] multicluster security In-Reply-To: References: <83936033-ce82-0a9b-3714-1dbea4c317db@nasa.gov> Message-ID: <529f575b-eb11-a81e-2905-ab3237d41678@nasa.gov> Interesting! Thank you for the explanation. This makes me wish GPFS had a client access model that more closely mimicked parallel NAS, specifically for this reason. That then got me wondering about pNFS support. I've not been able to find much about that but in theory Ganesha supports pNFS. Does anyone know of successful pNFS testing with GPFS and if so how one would set up such a thing? -Aaron On 08/25/2017 06:41 PM, IBM Spectrum Scale wrote: > > Hi Aaron, > > If cluster A uses the mmauth command to grant a file system read-only > access to a remote cluster B, nodes on cluster B can only mount that > file system with read-only access. But the only checking being done at > the RPC level is the TLS authentication. This should prevent non-root > users from initiating RPCs, since TLS authentication requires access > to the local cluster's private key. However, a root user on cluster B, > having access to cluster B's private key, might be able to craft RPCs > that may allow one to work around the checks which are implemented at > the file system level. > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum > Scale (GPFS), then please post it to the public IBM developerWroks > Forum at > https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. > > > If your query concerns a potential software error in Spectrum Scale > (GPFS) and you have an IBM software maintenance contract please > contact 1-800-237-5511 in the United States or your local IBM Service > Center in other countries. > > The forum is informally monitored as time permits and should not be > used for priority messages to the Spectrum Scale (GPFS) team. > > Inactive hide details for Aaron Knister ---08/21/2017 11:04:06 PM---Hi > Everyone, I have a theoretical question about GPFS multiAaron Knister > ---08/21/2017 11:04:06 PM---Hi Everyone, I have a theoretical question > about GPFS multiclusters and security. > > From: Aaron Knister > To: gpfsug main discussion list > Date: 08/21/2017 11:04 PM > Subject: [gpfsug-discuss] multicluster security > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > ------------------------------------------------------------------------ > > > > Hi Everyone, > > I have a theoretical question about GPFS multiclusters and security. > Let's say I have clusters A and B. Cluster A is exporting a filesystem > as read-only to cluster B. > > Where does the authorization burden lay? Meaning, does the security rely > on mmfsd in cluster B to behave itself and enforce the conditions of the > multi-cluster export? Could someone using the credentials on a > compromised node in cluster B just start sending arbitrary nsd > read/write commands to the nsds from cluster A (or something along those > lines)? Do the NSD servers in cluster A do any sort of sanity or > security checking on the I/O requests coming from cluster B to the NSDs > they're serving to exported filesystems? > > I imagine any enforcement would go out the window with shared disks in a > multi-cluster environment since a compromised node could just "dd" over > the LUNs. > > Thanks! > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=oK_bEPbjuD7j6qLTHbe7HM4ujUlpcNYtX3tMW2QC7_w&s=BliMQ0pToLIIiO1jfyUp2Q3icewcONrcmHpsIj_hMtY&e= > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From oehmes at gmail.com Fri Sep 8 22:21:00 2017 From: oehmes at gmail.com (Sven Oehme) Date: Fri, 08 Sep 2017 21:21:00 +0000 Subject: [gpfsug-discuss] mmfsd write behavior In-Reply-To: References: <0f61621f-84d9-e249-0dd7-c1a4d50fea86@nasa.gov> Message-ID: Hi, the code assumption is that the underlying device has no volatile write cache, i was absolute sure we have that somewhere in the FAQ, but i couldn't find it, so i will talk to somebody to correct this. if i understand https://www.kernel.org/doc/Documentation/block/writeback_cache_control.txt correct one could enforce this by setting REQ_FUA, but thats not explicitly set today, at least i can't see it. i will discuss this with one of our devs who owns this code and come back. sven On Thu, Sep 7, 2017 at 8:05 PM Aaron Knister wrote: > Thanks Sven. I didn't think GPFS itself was caching anything on that > layer, but it's my understanding that O_DIRECT isn't sufficient to force > I/O to be flushed (e.g. the device itself might have a volatile caching > layer). Take someone using ZFS zvol's as NSDs. I can write() all day log > to that zvol (even with O_DIRECT) but there is absolutely no guarantee > those writes have been committed to stable storage and aren't just > sitting in RAM until an fsync() occurs (or some other bio function that > causes a flush). I also don't believe writing to a SATA drive with > O_DIRECT will force cache flushes of the drive's writeback cache.. > although I just tested that one and it seems to actually trigger a scsi > cache sync. Interesting. > > -Aaron > > On 9/7/17 10:55 PM, Sven Oehme wrote: > > I am not sure what exactly you are looking for but all blockdevices are > > opened with O_DIRECT , we never cache anything on this layer . > > > > > > On Thu, Sep 7, 2017, 7:11 PM Aaron Knister > > wrote: > > > > Hi Everyone, > > > > This is something that's come up in the past and has recently > resurfaced > > with a project I've been working on, and that is-- it seems to me as > > though mmfsd never attempts to flush the cache of the block devices > its > > writing to (looking at blktrace output seems to confirm this). Is > this > > actually the case? I've looked at the gpl headers for linux and I > don't > > see any sign of blkdev_fsync, blkdev_issue_flush, WRITE_FLUSH, or > > REQ_FLUSH. I'm sure there's other ways to trigger this behavior that > > GPFS may very well be using that I've missed. That's why I'm asking > :) > > > > I figure with FPO being pushed as an HDFS replacement using commodity > > drives this feature has *got* to be in the code somewhere. > > > > -Aaron > > > > -- > > Aaron Knister > > NASA Center for Climate Simulation (Code 606.2) > > Goddard Space Flight Center > > (301) 286-2776 > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Sat Sep 9 09:05:31 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Sat, 9 Sep 2017 10:05:31 +0200 Subject: [gpfsug-discuss] multicluster security In-Reply-To: <529f575b-eb11-a81e-2905-ab3237d41678@nasa.gov> References: <83936033-ce82-0a9b-3714-1dbea4c317db@nasa.gov> <529f575b-eb11-a81e-2905-ab3237d41678@nasa.gov> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 105 bytes Desc: not available URL: From aaron.s.knister at nasa.gov Mon Sep 11 01:43:56 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Sun, 10 Sep 2017 20:43:56 -0400 Subject: [gpfsug-discuss] tuning parameters question Message-ID: <21830c8a-4a00-c75a-76aa-772c261c00eb@nasa.gov> Hi All (but mostly Sven), I stumbled across this great gem: files.gpfsug.org/presentations/2014/UG10_GPFS_Performance_Session_v10.pdf and I'm wondering which, if any, of those tuning parameters are still relevant with the 4.2.3 code. Specifically for a CNFS cluster. I'm exporting a gpfs fs as an NFS root to 1k nodes. The boot storm is particularly ugly and the storage doesn't appear to be bottlenecked. I see a lot of waiters like these: Waiting 0.0009 sec since 20:41:31, monitored, thread 2881 InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), reason 'waiting for LX lock' Waiting 0.0009 sec since 20:41:31, monitored, thread 26231 InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), reason 'waiting for LX lock' Waiting 0.0009 sec since 20:41:31, monitored, thread 26146 InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), reason 'waiting for LX lock' Waiting 0.0009 sec since 20:41:31, monitored, thread 18637 InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), reason 'waiting for LX lock' Waiting 0.0009 sec since 20:41:31, monitored, thread 25013 InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), reason 'waiting for LX lock' Waiting 0.0009 sec since 20:41:31, monitored, thread 27879 InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), reason 'waiting for LX lock' Waiting 0.0009 sec since 20:41:31, monitored, thread 26553 InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), reason 'waiting for LX lock' Waiting 0.0009 sec since 20:41:31, monitored, thread 25334 InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), reason 'waiting for LX lock' Waiting 0.0009 sec since 20:41:31, monitored, thread 25337 InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), reason 'waiting for LX lock' and I'm wondering if there's anything immediate one would suggest to help with that. -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From aaron.s.knister at nasa.gov Mon Sep 11 01:50:39 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Sun, 10 Sep 2017 20:50:39 -0400 Subject: [gpfsug-discuss] tuning parameters question In-Reply-To: <21830c8a-4a00-c75a-76aa-772c261c00eb@nasa.gov> References: <21830c8a-4a00-c75a-76aa-772c261c00eb@nasa.gov> Message-ID: <25e8995a-7b5b-05d3-016a-4941fb75dcd6@nasa.gov> As an aside, my initial attempt was to use Ganesha via CES but the performance was significantly worse than CNFS for this workload. The docs seem to suggest that CNFS performs better for metadata intensive workloads which certainly seems to fit the bill here. -Aaron On 9/10/17 8:43 PM, Aaron Knister wrote: > Hi All (but mostly Sven), > > I stumbled across this great gem: > > files.gpfsug.org/presentations/2014/UG10_GPFS_Performance_Session_v10.pdf > > and I'm wondering which, if any, of those tuning parameters are still > relevant with the 4.2.3 code. Specifically for a CNFS cluster. I'm > exporting a gpfs fs as an NFS root to 1k nodes. The boot storm is > particularly ugly and the storage doesn't appear to be bottlenecked. > > I see a lot of waiters like these: > > Waiting 0.0009 sec since 20:41:31, monitored, thread 2881 > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > reason 'waiting for LX lock' > Waiting 0.0009 sec since 20:41:31, monitored, thread 26231 > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > reason 'waiting for LX lock' > Waiting 0.0009 sec since 20:41:31, monitored, thread 26146 > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > reason 'waiting for LX lock' > Waiting 0.0009 sec since 20:41:31, monitored, thread 18637 > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > reason 'waiting for LX lock' > Waiting 0.0009 sec since 20:41:31, monitored, thread 25013 > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > reason 'waiting for LX lock' > Waiting 0.0009 sec since 20:41:31, monitored, thread 27879 > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > reason 'waiting for LX lock' > Waiting 0.0009 sec since 20:41:31, monitored, thread 26553 > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > reason 'waiting for LX lock' > Waiting 0.0009 sec since 20:41:31, monitored, thread 25334 > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > reason 'waiting for LX lock' > Waiting 0.0009 sec since 20:41:31, monitored, thread 25337 > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > reason 'waiting for LX lock' > > and I'm wondering if there's anything immediate one would suggest to > help with that. > > -Aaron > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From stefan.dietrich at desy.de Mon Sep 11 08:40:14 2017 From: stefan.dietrich at desy.de (Dietrich, Stefan) Date: Mon, 11 Sep 2017 09:40:14 +0200 (CEST) Subject: [gpfsug-discuss] Switch from IPoIB connected mode to datagram with ESS 5.2.0? Message-ID: <743361352.9211728.1505115614463.JavaMail.zimbra@desy.de> Hello, during reading the upgrade docs for ESS 5.2.0, I noticed a change in the IPoIB mode. Now it specifies, that datagram (CONNECTED_MODE=no) instead of connected mode should be used. All earlier versions used connected mode. I am wondering about the reason for this change? Or is this only relevant for bonded IPoIB interfaces? Regards, Stefan -- ------------------------------------------------------------------------ Stefan Dietrich Deutsches Elektronen-Synchrotron (IT-Systems) Ein Forschungszentrum der Helmholtz-Gemeinschaft Notkestr. 85 phone: +49-40-8998-4696 22607 Hamburg e-mail: stefan.dietrich at desy.de Germany ------------------------------------------------------------------------ From john.hearns at asml.com Mon Sep 11 08:41:54 2017 From: john.hearns at asml.com (John Hearns) Date: Mon, 11 Sep 2017 07:41:54 +0000 Subject: [gpfsug-discuss] Deletion of a pcache snapshot? In-Reply-To: References: <950421843.15715748.1504812216286.JavaMail.zimbra@desy.de> Message-ID: Thankyou all for advice. The ?-p? option was the fix here (thankyou to IBM support). From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of McLaughlin, Sandra M Sent: Friday, September 08, 2017 11:12 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? John, I had a period when I had to delete and remake AFM filesets rather frequently ? this always worked for me: mmunlinkfileset device fset -f mmdelsnapshot device snapname -j fset mmdelfileset device fset -f Sandra From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of John Hearns Sent: 08 September 2017 08:26 To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? Venkat, thankyou. I have a support ticket open on this issue, and will keep this option handy! From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Venkateswara R Puvvada Sent: Friday, September 08, 2017 7:01 AM To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? Snapshots created by AFM (recovery, resync or peer snapshots) cannot be deleted by user using mmdelsnapshot command directly. After recovery or resync completion they get deleted automatically. For peer snapshots deletion mmpsnasp command is used. Which version of GPFS? Try with -p (undocumented) option. mmdelsnapshot device snapname -j fset -p ~Venkat (vpuvvada at in.ibm.com) From: "Malka, Janusz" > To: gpfsug main discussion list > Date: 09/08/2017 12:53 AM Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I had similar issue, I had to recover connection to home ________________________________ From: "John Hearns" > To: "gpfsug main discussion list" > Sent: Thursday, 7 September, 2017 17:52:19 Subject: [gpfsug-discuss] Deletion of a pcache snapshot? Firmly lining myself up for a smack round the chops with a wet haddock? I try to delete an AFM cache fileset which I create da few days ago (it has an NFS home) Mmdelfileset responds that : Fileset obfuscated has 1 fileset snapshot(s). When I try to delete the snapshot: Snapshot obfuscated.afm.1234 is an internal pcache recovery snapshot and cannot be deleted by user. I find this reference, which is about as useful as a wet haddock: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm The advice of the gallery is sought, please. -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=VX-Y-5GtodMzrpt1D71-1OOY3hu2UTJBg45sqxTHC8I&s=AmQf6iZeaanuNkB3Ys2lR8Ajk-n2TUJ6Wbt2z2pnbKI&e= -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. ________________________________ AstraZeneca UK Limited is a company incorporated in England and Wales with registered number:03674842 and its registered office at 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA. This e-mail and its attachments are intended for the above named recipient only and may contain confidential and privileged information. If they have come to you in error, you must not copy or show them to anyone; instead, please reply to this e-mail, highlighting the error to the sender and then immediately delete the message. For information about how AstraZeneca UK Limited and its affiliates may process information, personal data and monitor communications, please see our privacy notice at www.astrazeneca.com -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Mon Sep 11 09:11:15 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Mon, 11 Sep 2017 10:11:15 +0200 Subject: [gpfsug-discuss] tuning parameters question In-Reply-To: <25e8995a-7b5b-05d3-016a-4941fb75dcd6@nasa.gov> References: <21830c8a-4a00-c75a-76aa-772c261c00eb@nasa.gov> <25e8995a-7b5b-05d3-016a-4941fb75dcd6@nasa.gov> Message-ID: An HTML attachment was scrubbed... URL: From ed.swindelles at uconn.edu Mon Sep 11 16:49:15 2017 From: ed.swindelles at uconn.edu (Swindelles, Ed) Date: Mon, 11 Sep 2017 15:49:15 +0000 Subject: [gpfsug-discuss] UConn hiring GPFS administrator Message-ID: The University of Connecticut is hiring three full time, permanent technical positions for its HPC team on the Storrs campus. One of these positions is focused on storage administration, including a GPFS cluster. I would greatly appreciate it if you would forward this announcement to contacts of yours who may have an interest in these positions. Here are direct links to the job descriptions and applications: HPC Storage Administrator http://s.uconn.edu/3tx HPC Systems Administrator (2 positions to be filled) http://s.uconn.edu/3tw Thank you, -- Ed Swindelles Team Lead for Research Technology University of Connecticut 860-486-4522 -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Mon Sep 11 23:15:10 2017 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Mon, 11 Sep 2017 18:15:10 -0400 Subject: [gpfsug-discuss] tuning parameters question In-Reply-To: References: <21830c8a-4a00-c75a-76aa-772c261c00eb@nasa.gov> <25e8995a-7b5b-05d3-016a-4941fb75dcd6@nasa.gov> Message-ID: <9de64193-c60c-8ee1-b681-6cfe3993772b@nasa.gov> Thanks, Olaf. I ended up un-setting a bunch of settings that are now auto-tuned (worker1threads, worker3threads, etc.) and just set workerthreads as you suggest. That combined with increasing maxfilestocache to above the max concurrent open file threshold of the workload got me consistently with in 1%-3% of the performance of the same storage hardware running btrfs instead of GPFS. I think that's pretty darned good considering the additional complexity GPFS has over btrfs of being a clustered filesystem. Plus I now get NFS server failover for very little effort and without having to deal with corosync or pacemaker. -Aaron On 9/11/17 4:11 AM, Olaf Weiser wrote: > Hi Aaron , > > 0,0009 s response time for your meta data IO ... seems to be a very > good/fast storage BE.. which is hard to improve.. > you can raise the parallelism a bit for accessing metadata , but if this > will help to improve your "workload" is not assured > > The worker3threads parameter specifies the number of threads to use for > inode prefetch. Usually , I would suggest, that you should not touch > single parameters any longer. By the great improvements of the last few > releases.. GPFS can calculate / retrieve the right settings > semi-automatically... > You only need to set simpler "workerThreads" .. > > But in your case , you can see, if this more specific value will help > you out . > > depending on your blocksize and average filesize .. you may see > additional improvements when tuning nfsPrefetchStrategy , which tells > GPFS to consider all IOs wihtin */N/* blockboundaries as sequential ?and > starts prefetch > > l.b.n.t. set ignoreprefetchLunCount to yes .. (if not already done) . > this helps GPFS to use all available workerThreads > > cheers > olaf > > > > From: Aaron Knister > To: > Date: 09/11/2017 02:50 AM > Subject: Re: [gpfsug-discuss] tuning parameters question > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------------------------------------------------ > > > > As an aside, my initial attempt was to use Ganesha via CES but the > performance was significantly worse than CNFS for this workload. The > docs seem to suggest that CNFS performs better for metadata intensive > workloads which certainly seems to fit the bill here. > > -Aaron > > On 9/10/17 8:43 PM, Aaron Knister wrote: > > Hi All (but mostly Sven), > > > > I stumbled across this great gem: > > > > files.gpfsug.org/presentations/2014/UG10_GPFS_Performance_Session_v10.pdf > > > > and I'm wondering which, if any, of those tuning parameters are still > > relevant with the 4.2.3 code. Specifically for a CNFS cluster. I'm > > exporting a gpfs fs as an NFS root to 1k nodes. The boot storm is > > particularly ugly and the storage doesn't appear to be bottlenecked. > > > > I see a lot of waiters like these: > > > > Waiting 0.0009 sec since 20:41:31, monitored, thread 2881 > > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > > reason 'waiting for LX lock' > > Waiting 0.0009 sec since 20:41:31, monitored, thread 26231 > > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > > reason 'waiting for LX lock' > > Waiting 0.0009 sec since 20:41:31, monitored, thread 26146 > > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > > reason 'waiting for LX lock' > > Waiting 0.0009 sec since 20:41:31, monitored, thread 18637 > > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > > reason 'waiting for LX lock' > > Waiting 0.0009 sec since 20:41:31, monitored, thread 25013 > > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > > reason 'waiting for LX lock' > > Waiting 0.0009 sec since 20:41:31, monitored, thread 27879 > > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > > reason 'waiting for LX lock' > > Waiting 0.0009 sec since 20:41:31, monitored, thread 26553 > > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > > reason 'waiting for LX lock' > > Waiting 0.0009 sec since 20:41:31, monitored, thread 25334 > > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > > reason 'waiting for LX lock' > > Waiting 0.0009 sec since 20:41:31, monitored, thread 25337 > > InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar), > > reason 'waiting for LX lock' > > > > and I'm wondering if there's anything immediate one would suggest to > > help with that. > > > > -Aaron > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From zacekm at img.cas.cz Tue Sep 12 10:40:35 2017 From: zacekm at img.cas.cz (Michal Zacek) Date: Tue, 12 Sep 2017 11:40:35 +0200 Subject: [gpfsug-discuss] Wrong nodename after server restart Message-ID: <7e565699-b583-eeeb-c6b9-d11a39206331@img.cas.cz> Hi, I had to restart two of my gpfs servers (gpfs-n4 and gpfs-quorum) and after that I was unable to move CES IP address back with strange error "mmces address move: GPFS is down on this node". After I double checked that gpfs state is active on all nodes, I dug deeper and I think I found problem, but I don't really know how this could happen. Look at the names of nodes: [root at gpfs-n2 ~]# mmlscluster # Looks good GPFS cluster information ======================== GPFS cluster name: gpfscl1.img.local GPFS cluster id: 17792677515884116443 GPFS UID domain: img.local Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation ---------------------------------------------------------------------------------- 1 gpfs-n4.img.local 192.168.20.64 gpfs-n4.img.local quorum-manager 2 gpfs-quorum.img.local 192.168.20.60 gpfs-quorum.img.local quorum 3 gpfs-n3.img.local 192.168.20.63 gpfs-n3.img.local quorum-manager 4 tau.img.local 192.168.1.248 tau.img.local 5 gpfs-n1.img.local 192.168.20.61 gpfs-n1.img.local quorum-manager 6 gpfs-n2.img.local 192.168.20.62 gpfs-n2.img.local quorum-manager 8 whale.img.cas.cz 147.231.150.108 whale.img.cas.cz [root at gpfs-n2 ~]# mmlsmount gpfs01 -L # not so good File system gpfs01 is mounted on 7 nodes: 192.168.20.63 gpfs-n3 192.168.20.61 gpfs-n1 192.168.20.62 gpfs-n2 192.168.1.248 tau 192.168.20.64 gpfs-n4.img.local 192.168.20.60 gpfs-quorum.img.local 147.231.150.108 whale.img.cas.cz [root at gpfs-n2 ~]# tsctl shownodes up | tr ',' '\n' # very wrong whale.img.cas.cz.img.local tau.img.local gpfs-quorum.img.local.img.local gpfs-n1.img.local gpfs-n2.img.local gpfs-n3.img.local gpfs-n4.img.local.img.local The "tsctl shownodes up" is the reason why I'm not able to move CES address back to gpfs-n4 node, but the real problem are different nodenames. I think OS is configured correctly: [root at gpfs-n4 /]# hostname gpfs-n4 [root at gpfs-n4 /]# hostname -f gpfs-n4.img.local [root at gpfs-n4 /]# cat /etc/resolv.conf nameserver 192.168.20.30 nameserver 147.231.150.2 search img.local domain img.local [root at gpfs-n4 /]# cat /etc/hosts | grep gpfs-n4 192.168.20.64 gpfs-n4.img.local gpfs-n4 [root at gpfs-n4 /]# host gpfs-n4 gpfs-n4.img.local has address 192.168.20.64 [root at gpfs-n4 /]# host 192.168.20.64 64.20.168.192.in-addr.arpa domain name pointer gpfs-n4.img.local. Can someone help me with this. Thanks, Michal p.s. gpfs version: 4.2.3-2 (CentOS 7) From secretary at gpfsug.org Tue Sep 12 15:22:41 2017 From: secretary at gpfsug.org (Secretary GPFS UG) Date: Tue, 12 Sep 2017 15:22:41 +0100 Subject: [gpfsug-discuss] SS UG UK 2018 Message-ID: Dear all, A date for your diary, #SSUG18 in the UK will be taking place on April 18th & 19th 2018. Please mark it in your diaries now! We'll confirm other details (venue, agenda etc.) nearer the time, but the date is confirmed. Thanks, -- Claire O'Toole Spectrum Scale/GPFS User Group Secretary +44 (0)7508 033896 www.spectrumscaleug.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Tue Sep 12 16:01:21 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Tue, 12 Sep 2017 11:01:21 -0400 Subject: [gpfsug-discuss] Wrong nodename after server restart In-Reply-To: <7e565699-b583-eeeb-c6b9-d11a39206331@img.cas.cz> References: <7e565699-b583-eeeb-c6b9-d11a39206331@img.cas.cz> Message-ID: Michal, When a node is added to a cluster that has a different domain than the rest of the nodes in the cluster, the GPFS daemons running on the various nodes can develop an inconsistent understanding of what the common suffix of all the domain names are. The symptoms you show with the "tsctl shownodes up" output, and in particular the incorrect node names of the two nodes you restarted, as seen on a node you did not restart, are consistent with this problem. I also note your cluster appears to have the necessary pre-condition to trip on this problem, whale.img.cas.cz does not share a common suffix with the other nodes in the cluster. The common suffix of the other nodes in the cluster is ".img.local". Was whale.img.cas.cz recently added to the cluster? Unfortunately, the general work-around is to recycle all the nodes at once: mmshutdown -a, followed by mmstartup -a. I hope this helps. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Michal Zacek To: gpfsug-discuss at spectrumscale.org Date: 09/12/2017 05:41 AM Subject: [gpfsug-discuss] Wrong nodename after server restart Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, I had to restart two of my gpfs servers (gpfs-n4 and gpfs-quorum) and after that I was unable to move CES IP address back with strange error "mmces address move: GPFS is down on this node". After I double checked that gpfs state is active on all nodes, I dug deeper and I think I found problem, but I don't really know how this could happen. Look at the names of nodes: [root at gpfs-n2 ~]# mmlscluster # Looks good GPFS cluster information ======================== GPFS cluster name: gpfscl1.img.local GPFS cluster id: 17792677515884116443 GPFS UID domain: img.local Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation ---------------------------------------------------------------------------------- 1 gpfs-n4.img.local 192.168.20.64 gpfs-n4.img.local quorum-manager 2 gpfs-quorum.img.local 192.168.20.60 gpfs-quorum.img.local quorum 3 gpfs-n3.img.local 192.168.20.63 gpfs-n3.img.local quorum-manager 4 tau.img.local 192.168.1.248 tau.img.local 5 gpfs-n1.img.local 192.168.20.61 gpfs-n1.img.local quorum-manager 6 gpfs-n2.img.local 192.168.20.62 gpfs-n2.img.local quorum-manager 8 whale.img.cas.cz 147.231.150.108 whale.img.cas.cz [root at gpfs-n2 ~]# mmlsmount gpfs01 -L # not so good File system gpfs01 is mounted on 7 nodes: 192.168.20.63 gpfs-n3 192.168.20.61 gpfs-n1 192.168.20.62 gpfs-n2 192.168.1.248 tau 192.168.20.64 gpfs-n4.img.local 192.168.20.60 gpfs-quorum.img.local 147.231.150.108 whale.img.cas.cz [root at gpfs-n2 ~]# tsctl shownodes up | tr ',' '\n' # very wrong whale.img.cas.cz.img.local tau.img.local gpfs-quorum.img.local.img.local gpfs-n1.img.local gpfs-n2.img.local gpfs-n3.img.local gpfs-n4.img.local.img.local The "tsctl shownodes up" is the reason why I'm not able to move CES address back to gpfs-n4 node, but the real problem are different nodenames. I think OS is configured correctly: [root at gpfs-n4 /]# hostname gpfs-n4 [root at gpfs-n4 /]# hostname -f gpfs-n4.img.local [root at gpfs-n4 /]# cat /etc/resolv.conf nameserver 192.168.20.30 nameserver 147.231.150.2 search img.local domain img.local [root at gpfs-n4 /]# cat /etc/hosts | grep gpfs-n4 192.168.20.64 gpfs-n4.img.local gpfs-n4 [root at gpfs-n4 /]# host gpfs-n4 gpfs-n4.img.local has address 192.168.20.64 [root at gpfs-n4 /]# host 192.168.20.64 64.20.168.192.in-addr.arpa domain name pointer gpfs-n4.img.local. Can someone help me with this. Thanks, Michal p.s. gpfs version: 4.2.3-2 (CentOS 7) _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=l_sz-tPolX87WmSf2zBhhPpggnfQJKp7-BqV8euBp7A&s=XSPGkKRMza8PhYQg8AxeKW9cOTNeCI9uph486_6Xajo&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.kidger at uk.ibm.com Tue Sep 12 16:36:06 2017 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Tue, 12 Sep 2017 15:36:06 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 68, Issue 13 In-Reply-To: Message-ID: Well George is not the only one to have replied to the list with a one to one message. ? Remember folks, this mailing list has a *lot* of people on it. Hope my message is last that forgets who is in the 'To' field. Daniel Daniel Kidger Technical Sales Specialist, IBM UK IBM Spectrum Storage Software daniel.kidger at uk.ibm.com +44 (0)7818 522266 > On 8 Sep 2017, at 19:30, Ken Atkinson wrote: > > Not on too many G&Ts Georgina? > How are things. > Ken Atkinson > > On 8 Sep 2017 08:33, "Georgina Ellis" wrote: > Apologies All, slip of the keyboard and not a comment on GPFS! > > Sent from my iPhone > > > On 7 Sep 2017, at 22:16, "gpfsug-discuss-request at spectrumscale.org" wrote: > > > > Send gpfsug-discuss mailing list submissions to > > gpfsug-discuss at spectrumscale.org > > > > To subscribe or unsubscribe via the World Wide Web, visit > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > or, via email, send a message with subject or body 'help' to > > gpfsug-discuss-request at spectrumscale.org > > > > You can reach the person managing the list at > > gpfsug-discuss-owner at spectrumscale.org > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of gpfsug-discuss digest..." > > > > > > Today's Topics: > > > > 1. Re: Deletion of a pcache snapshot? (Malka, Janusz) > > 2. Re: SMB2 leases - oplocks - growing files (Christof Schmitt) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Thu, 7 Sep 2017 21:23:36 +0200 (CEST) > > From: "Malka, Janusz" > > To: gpfsug main discussion list > > Subject: Re: [gpfsug-discuss] Deletion of a pcache snapshot? > > Message-ID: <950421843.15715748.1504812216286.JavaMail.zimbra at desy.de> > > Content-Type: text/plain; charset="utf-8" > > > > I had similar issue, I had to recover connection to home > > > > > > From: "John Hearns" > > To: "gpfsug main discussion list" > > Sent: Thursday, 7 September, 2017 17:52:19 > > Subject: [gpfsug-discuss] Deletion of a pcache snapshot? > > > > > > > > Firmly lining myself up for a smack round the chops with a wet haddock? > > > > I try to delete an AFM cache fileset which I create da few days ago (it has an NFS home) > > > > Mmdelfileset responds that : > > > > Fileset obfuscated has 1 fileset snapshot(s). > > > > > > > > When I try to delete the snapshot: > > > > Snapshot obfuscated.afm.1234 is an internal pcache recovery snapshot and cannot be deleted by user. > > > > > > > > I find this reference, which is about as useful as a wet haddock: > > > > [ https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm | https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/6027-3321.htm ] > > > > > > > > The advice of the gallery is sought, please. > > > > > > > > > > > > > > -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > URL: > > > > ------------------------------ > > > > Message: 2 > > Date: Thu, 7 Sep 2017 21:16:34 +0000 > > From: "Christof Schmitt" > > To: gpfsug-discuss at spectrumscale.org > > Cc: gpfsug-discuss at spectrumscale.org > > Subject: Re: [gpfsug-discuss] SMB2 leases - oplocks - growing files > > Message-ID: > > > > > > Content-Type: text/plain; charset="us-ascii" > > > > An HTML attachment was scrubbed... > > URL: > > > > ------------------------------ > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > End of gpfsug-discuss Digest, Vol 68, Issue 13 > > ********************************************** > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.mills at nasa.gov Tue Sep 12 17:06:23 2017 From: jonathan.mills at nasa.gov (Jonathan Mills) Date: Tue, 12 Sep 2017 12:06:23 -0400 (EDT) Subject: [gpfsug-discuss] Support for SLES 12 SP3 Message-ID: SLES 12 SP3 has been released. And for what it?s worth, there does not appear to be substantial changes in either kernel or glibc as compared to SLES 12 SP2. In fact, the latest SLES 12 SP2 kernel is ?4.4.74-92.29?, while the initial SLES 12 SP3 kernel is ?4.4.73-5.1?. Given this, I wanted to ask the team at IBM: 1) have you begun looking into SLES 12 SP3 yet? 2) if so, do you have any idea when you might release a fully supported version of Spectrum Scale for SLES 12 SP3? Those of us who run SLES and are looking to deploy new infrastructure this fall would prefer to do so on the latest rev of our OS, as opposed to one that is already on life support... -- Jonathan Mills / jonathan.mills at nasa.gov NASA GSFC / NCCS HPC (606.2) Bldg 28, Rm. S230 / c. 252-412-5710 From Greg.Lehmann at csiro.au Wed Sep 13 00:12:55 2017 From: Greg.Lehmann at csiro.au (Greg.Lehmann at csiro.au) Date: Tue, 12 Sep 2017 23:12:55 +0000 Subject: [gpfsug-discuss] Support for SLES 12 SP3 In-Reply-To: References: Message-ID: <67f390a558244c41b154a7a6a9e5efe8@exch1-cdc.nexus.csiro.au> +1. We are interested in SLES 12 SP3 too. BTW had anybody done any comparisons of SLES 12 SP2 (4.4) kernel vs RHEL 7.3 in terms of GPFS IO performance? I would think the 4.4 kernel might give it an edge. I'll probably get around to comparing them myself one day, but if anyone else has some numbers... -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jonathan Mills Sent: Wednesday, 13 September 2017 2:06 AM To: gpfsug main discussion list Subject: [gpfsug-discuss] Support for SLES 12 SP3 SLES 12 SP3 has been released. And for what it?s worth, there does not appear to be substantial changes in either kernel or glibc as compared to SLES 12 SP2. In fact, the latest SLES 12 SP2 kernel is ?4.4.74-92.29?, while the initial SLES 12 SP3 kernel is ?4.4.73-5.1?. Given this, I wanted to ask the team at IBM: 1) have you begun looking into SLES 12 SP3 yet? 2) if so, do you have any idea when you might release a fully supported version of Spectrum Scale for SLES 12 SP3? Those of us who run SLES and are looking to deploy new infrastructure this fall would prefer to do so on the latest rev of our OS, as opposed to one that is already on life support... -- Jonathan Mills / jonathan.mills at nasa.gov NASA GSFC / NCCS HPC (606.2) Bldg 28, Rm. S230 / c. 252-412-5710 From scale at us.ibm.com Wed Sep 13 22:33:30 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Wed, 13 Sep 2017 17:33:30 -0400 Subject: [gpfsug-discuss] Fw: Wrong nodename after server restart Message-ID: ----- Forwarded by Eric Agar/Poughkeepsie/IBM on 09/13/2017 05:32 PM ----- From: IBM Spectrum Scale/Poughkeepsie/IBM To: Michal Zacek Date: 09/13/2017 05:29 PM Subject: Re: [gpfsug-discuss] Wrong nodename after server restart Sent by: Eric Agar Hello Michal, It should not be necessary to delete whale.img.cas.cz and rename it. But, that is an option you can take, if you prefer it. If you decide to take that option, please see the last paragraph of this response. The confusion starts at the moment a node is added to the active cluster where the new node does not have the same common domain suffix as the nodes that were already in the cluster. The confusion increases when the GPFS daemons on some nodes, but not all nodes, are recycled. Doing mmshutdown -a, followed by mmstartup -a, once after the new node has been added allows all GPFS daemons on all nodes to come up at the same time and arrive at the same answer to the question, "what is the common domain suffix for all the nodes in the cluster now?" In the case of your cluster, the answer will be "the common domain suffix is the empty string" or, put another way, "there is no common domain suffix"; that is okay, as long as all the GPFS daemons come to the same conclusion. After you recycle the cluster, you can check to make sure all seems well by running "tsctl shownodes up" on every node, and make sure the answer is correct on each node. If the mmshutdown -a / mmstartup -a recycle works, the problem should not recur with the current set of nodes in the cluster. Even as individual GPFS daemons are recycled going forward, they should still understand the cluster's nodes have no common domain suffix. However, I can imagine sequences of events that would cause the issue to occur again after nodes are deleted or added to the cluster while the cluster is active. For example, if whale.img.cas.cz were to be deleted from the current cluster, that action would restore the cluster to having a common domain suffix of ".img.local", but already running GPFS daemons would not realize it. If the delete of whale occurred while the cluster was active, subsequent recycling of the GPFS daemon on just a subset of the nodes would cause the recycled daemons to understand the common domain suffix to now be ".img.local". But, daemons that had not been recycled would still think there is no common domain suffix. The confusion would occur again. On the other hand, adding and deleting nodes to/from the cluster should not cause the issue to occur again as long as the cluster continues to have the same (in this case, no) common domain suffix. If you decide to delete whale.img.case.cz, rename it to have the ".img.local" domain suffix, and add it back to the cluster, it would be best to do so after all the GPFS daemons are shut down with mmshutdown -a, but before any of the daemons are restarted with mmstartup. This would allow all the subsequent running daemons to come to the conclusion that ".img.local" is now the common domain suffix. I hope this helps. Regards, Eric Agar Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Michal Zacek To: IBM Spectrum Scale Date: 09/13/2017 03:42 AM Subject: Re: [gpfsug-discuss] Wrong nodename after server restart Hello yes you are correct, Whale was added two days a go. It's necessary to delete whale.img.cas.cz from cluster before mmshutdown/mmstartup? If the two domains may cause problems in the future I can rename whale (and all planed nodes) to img.local suffix. Many thanks for the prompt reply. Regards Michal Dne 12.9.2017 v 17:01 IBM Spectrum Scale napsal(a): Michal, When a node is added to a cluster that has a different domain than the rest of the nodes in the cluster, the GPFS daemons running on the various nodes can develop an inconsistent understanding of what the common suffix of all the domain names are. The symptoms you show with the "tsctl shownodes up" output, and in particular the incorrect node names of the two nodes you restarted, as seen on a node you did not restart, are consistent with this problem. I also note your cluster appears to have the necessary pre-condition to trip on this problem, whale.img.cas.cz does not share a common suffix with the other nodes in the cluster. The common suffix of the other nodes in the cluster is ".img.local". Was whale.img.cas.cz recently added to the cluster? Unfortunately, the general work-around is to recycle all the nodes at once: mmshutdown -a, followed by mmstartup -a. I hope this helps. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Michal Zacek To: gpfsug-discuss at spectrumscale.org Date: 09/12/2017 05:41 AM Subject: [gpfsug-discuss] Wrong nodename after server restart Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, I had to restart two of my gpfs servers (gpfs-n4 and gpfs-quorum) and after that I was unable to move CES IP address back with strange error "mmces address move: GPFS is down on this node". After I double checked that gpfs state is active on all nodes, I dug deeper and I think I found problem, but I don't really know how this could happen. Look at the names of nodes: [root at gpfs-n2 ~]# mmlscluster # Looks good GPFS cluster information ======================== GPFS cluster name: gpfscl1.img.local GPFS cluster id: 17792677515884116443 GPFS UID domain: img.local Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp Repository type: CCR Node Daemon node name IP address Admin node name Designation ---------------------------------------------------------------------------------- 1 gpfs-n4.img.local 192.168.20.64 gpfs-n4.img.local quorum-manager 2 gpfs-quorum.img.local 192.168.20.60 gpfs-quorum.img.local quorum 3 gpfs-n3.img.local 192.168.20.63 gpfs-n3.img.local quorum-manager 4 tau.img.local 192.168.1.248 tau.img.local 5 gpfs-n1.img.local 192.168.20.61 gpfs-n1.img.local quorum-manager 6 gpfs-n2.img.local 192.168.20.62 gpfs-n2.img.local quorum-manager 8 whale.img.cas.cz 147.231.150.108 whale.img.cas.cz [root at gpfs-n2 ~]# mmlsmount gpfs01 -L # not so good File system gpfs01 is mounted on 7 nodes: 192.168.20.63 gpfs-n3 192.168.20.61 gpfs-n1 192.168.20.62 gpfs-n2 192.168.1.248 tau 192.168.20.64 gpfs-n4.img.local 192.168.20.60 gpfs-quorum.img.local 147.231.150.108 whale.img.cas.cz [root at gpfs-n2 ~]# tsctl shownodes up | tr ',' '\n' # very wrong whale.img.cas.cz.img.local tau.img.local gpfs-quorum.img.local.img.local gpfs-n1.img.local gpfs-n2.img.local gpfs-n3.img.local gpfs-n4.img.local.img.local The "tsctl shownodes up" is the reason why I'm not able to move CES address back to gpfs-n4 node, but the real problem are different nodenames. I think OS is configured correctly: [root at gpfs-n4 /]# hostname gpfs-n4 [root at gpfs-n4 /]# hostname -f gpfs-n4.img.local [root at gpfs-n4 /]# cat /etc/resolv.conf nameserver 192.168.20.30 nameserver 147.231.150.2 search img.local domain img.local [root at gpfs-n4 /]# cat /etc/hosts | grep gpfs-n4 192.168.20.64 gpfs-n4.img.local gpfs-n4 [root at gpfs-n4 /]# host gpfs-n4 gpfs-n4.img.local has address 192.168.20.64 [root at gpfs-n4 /]# host 192.168.20.64 64.20.168.192.in-addr.arpa domain name pointer gpfs-n4.img.local. Can someone help me with this. Thanks, Michal p.s. gpfs version: 4.2.3-2 (CentOS 7) _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=l_sz-tPolX87WmSf2zBhhPpggnfQJKp7-BqV8euBp7A&s=XSPGkKRMza8PhYQg8AxeKW9cOTNeCI9uph486_6Xajo&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Michal ???ek | Information Technologies +420 296 443 128 +420 296 443 333 michal.zacek at img.cas.cz www.img.cas.cz Institute of Molecular Genetics of the ASCR, v. v. i., V?de?sk? 1083, 142 20 Prague 4, Czech Republic ID: 68378050 | VAT ID: CZ68378050 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 1997 bytes Desc: not available URL: From valdis.kletnieks at vt.edu Thu Sep 14 01:18:51 2017 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Wed, 13 Sep 2017 20:18:51 -0400 Subject: [gpfsug-discuss] mmapplypolicy run time weirdness.. Message-ID: <52657.1505348331@turing-police.cc.vt.edu> So we have a number of very similar policy files that get applied for file migration etc. And they vary drastically in the runtime to process, apparently due to different selections on whether to do the work in parallel. Running a set of rules with 'mmapplypolicy -I defer' that look like this: RULE 'VBI_FILES_RULE' MIGRATE FROM POOL 'system' THRESHOLD(0,100,0) WEIGHT(FILE_SIZE) TO POOL 'VBI_FILES' FOR FILESET('vbi') WHERE (mb_allocated >= 8) for 10 filesets can scan 325M directory entries in 6 minutes, and sort and evaluate the policy in 3 more minutes. However, this takes a bit over 30 minutes for the scan and another 20 for sorting and policy evaluation over the same set of filesets: RULE 'VBI_FILES_RULE' LIST 'pruned_files' THRESHOLD(90,80) WEIGHT(FILE_SIZE) FOR FILESET('vbi') WHERE (mb_allocated >= 8) even though the output is essentially identical. Why is LIST so much more expensive than 'MIGRATE" with '-I defer'? I could understand if I had an expensive SHOW clause, but there isn't one here (and a different policy that I run that *does* have a big SHOW clause takes almost the same amount of time as the minimal LIST).... I'm thinking that it has *something* to do with the MIGRATE job outputting: [I] 2017-09-12 at 21:20:44.155 Parallel-piped sort and policy evaluation. 0 files scanned. (...) [I] 2017-09-12 at 21:24:14.672 Piped sorting and candidate file choosing. 0 records scanned. while the LIST job says: [I] 2017-09-12 at 13:58:06.926 Sorting 327627521 file list records. (...) [I] 2017-09-12 at 14:02:04.223 Policy evaluation. 0 files scanned. (Both output the same message during the 'Directory entries scanned: 0.' phase, but I suspect MIGRATE is multi-threading that part as well, as it completes much faster). What's the controlling factor in mmapplypolicy's decision whether or not to parallelize the policy? From oehmes at gmail.com Thu Sep 14 01:28:46 2017 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 14 Sep 2017 00:28:46 +0000 Subject: [gpfsug-discuss] mmapplypolicy run time weirdness.. In-Reply-To: <52657.1505348331@turing-police.cc.vt.edu> References: <52657.1505348331@turing-police.cc.vt.edu> Message-ID: can you please share the entire command line you are using ? also gpfs version, mmlsconfig output would help as well as if this is a shared storage filesystem or a system using local disks. thx. Sven On Wed, Sep 13, 2017 at 5:19 PM wrote: > So we have a number of very similar policy files that get applied for file > migration etc. And they vary drastically in the runtime to process, > apparently > due to different selections on whether to do the work in parallel. > > Running a set of rules with 'mmapplypolicy -I defer' that look like this: > > RULE 'VBI_FILES_RULE' MIGRATE FROM POOL 'system' > THRESHOLD(0,100,0) > WEIGHT(FILE_SIZE) > TO POOL 'VBI_FILES' > FOR FILESET('vbi') > WHERE (mb_allocated >= 8) > > for 10 filesets can scan 325M directory entries in 6 minutes, and sort and > evaluate the policy in 3 more minutes. > > However, this takes a bit over 30 minutes for the scan and another 20 for > sorting and policy evaluation over the same set of filesets: > > RULE 'VBI_FILES_RULE' LIST 'pruned_files' > THRESHOLD(90,80) > WEIGHT(FILE_SIZE) > FOR FILESET('vbi') > WHERE (mb_allocated >= 8) > > even though the output is essentially identical. Why is LIST so much more > expensive than 'MIGRATE" with '-I defer'? I could understand if I > had an > expensive SHOW clause, but there isn't one here (and a different policy > that I > run that *does* have a big SHOW clause takes almost the same amount of > time as > the minimal LIST).... > > I'm thinking that it has *something* to do with the MIGRATE job outputting: > > [I] 2017-09-12 at 21:20:44.155 Parallel-piped sort and policy evaluation. 0 > files scanned. > (...) > [I] 2017-09-12 at 21:24:14.672 Piped sorting and candidate file choosing. 0 > records scanned. > > while the LIST job says: > > [I] 2017-09-12 at 13:58:06.926 Sorting 327627521 file list records. > (...) > [I] 2017-09-12 at 14:02:04.223 Policy evaluation. 0 files scanned. > > (Both output the same message during the 'Directory entries scanned: 0.' > phase, but I suspect MIGRATE is multi-threading that part as well, as it > completes much faster). > > What's the controlling factor in mmapplypolicy's decision whether or > not to parallelize the policy? > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kh.atmane at gmail.com Thu Sep 14 13:49:55 2017 From: kh.atmane at gmail.com (atmane) Date: Thu, 14 Sep 2017 13:49:55 +0100 Subject: [gpfsug-discuss] Disk change problem in gss GNR Message-ID: dear all, I change A Disk In Gss Storage Server mmchcarrier BB1RGL --release --pdisk 'e1d1s02' mmchcarrier BB1RGL --replace --pdisk 'e1d1s02' after replace disk Now I Have 2 Discs In My Gss the first disc was well changed name = "e1d1s02" the second disk still after I use this cmd mmdelpdisk BB1RGL --pdisk e1d1s02#004 -a the disk is still in use i need to reboot the system or ?? mmlspdisk all | less pdisk: replacementPriority = 1000 name = "e1d1s02" device = "/dev/sdik,/dev/sdih" recoveryGroup = "BB1RGL" declusteredArray = "DA1" state = "ok" capacity = 3000034656256 freeSpace = 1453846429696 fru = "00W1572" location = "SV30820390-1-2" WWN = "naa.5000C5008D783E37" server = "gss0-ib0" pdisk: replacementPriority = 1000 name = "e1d1s02#004" device = "" recoveryGroup = "BB1RGL" declusteredArray = "DA1" state = "missing/noPath/systemDrain/adminDrain/noRGD/noVCD" capacity = 3000034656256 freeSpace = 1599875317760 fru = "00W1572" location = "" WWN = "naa.5000C50056714E83" server = "gss0-ib0" -- -- Atmane Khiredine HPC System Admin | Office National de la M?t?orologie T?l : +213 21 50 73 93 Poste 303 | Fax : +213 21 50 79 40 | E-mail : a.khiredine at meteo.dz From makaplan at us.ibm.com Thu Sep 14 19:55:39 2017 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 14 Sep 2017 14:55:39 -0400 Subject: [gpfsug-discuss] mmapplypolicy run time weirdness.. In-Reply-To: <52657.1505348331@turing-police.cc.vt.edu> References: <52657.1505348331@turing-police.cc.vt.edu> Message-ID: Read the doc again. Specify both -g and -N options on the command line to get fully parallel directory and inode/policy scanning. I'm curious as to what you're trying to do with THRESHOLD(0,100,0) ... Perhaps premigrate everything (that matches the other conditions)? You are correct about I] 2017-09-12 at 21:20:44.155 Parallel-piped sort and policy evaluation. 0 files scanned. (...) [I] 2017-09-12 at 21:24:14.672 Piped sorting and candidate file choosing. 0 records scanned. If you don't see messages like that, you did not specify both -N and -g. From: valdis.kletnieks at vt.edu To: gpfsug-discuss at spectrumscale.org Date: 09/13/2017 08:19 PM Subject: [gpfsug-discuss] mmapplypolicy run time weirdness.. Sent by: gpfsug-discuss-bounces at spectrumscale.org So we have a number of very similar policy files that get applied for file migration etc. And they vary drastically in the runtime to process, apparently due to different selections on whether to do the work in parallel. Running a set of rules with 'mmapplypolicy -I defer' that look like this: RULE 'VBI_FILES_RULE' MIGRATE FROM POOL 'system' THRESHOLD(0,100,0) WEIGHT(FILE_SIZE) TO POOL 'VBI_FILES' FOR FILESET('vbi') WHERE (mb_allocated >= 8) for 10 filesets can scan 325M directory entries in 6 minutes, and sort and evaluate the policy in 3 more minutes. However, this takes a bit over 30 minutes for the scan and another 20 for sorting and policy evaluation over the same set of filesets: RULE 'VBI_FILES_RULE' LIST 'pruned_files' THRESHOLD(90,80) WEIGHT(FILE_SIZE) FOR FILESET('vbi') WHERE (mb_allocated >= 8) even though the output is essentially identical. Why is LIST so much more expensive than 'MIGRATE" with '-I defer'? I could understand if I had an expensive SHOW clause, but there isn't one here (and a different policy that I run that *does* have a big SHOW clause takes almost the same amount of time as the minimal LIST).... I'm thinking that it has *something* to do with the MIGRATE job outputting: [I] 2017-09-12 at 21:20:44.155 Parallel-piped sort and policy evaluation. 0 files scanned. (...) [I] 2017-09-12 at 21:24:14.672 Piped sorting and candidate file choosing. 0 records scanned. while the LIST job says: [I] 2017-09-12 at 13:58:06.926 Sorting 327627521 file list records. (...) [I] 2017-09-12 at 14:02:04.223 Policy evaluation. 0 files scanned. (Both output the same message during the 'Directory entries scanned: 0.' phase, but I suspect MIGRATE is multi-threading that part as well, as it completes much faster). What's the controlling factor in mmapplypolicy's decision whether or not to parallelize the policy? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8&m=SGbwD3m5mZ16_vwIFK8Ym48lwdF1tVktnSao0a_tkfA&s=sLt9AtZiZ0qZCKzuQoQuyxN76_R66jfAwQxdIY-w2m0&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Thu Sep 14 21:09:40 2017 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Thu, 14 Sep 2017 16:09:40 -0400 Subject: [gpfsug-discuss] mmapplypolicy run time weirdness.. In-Reply-To: References: <52657.1505348331@turing-police.cc.vt.edu> Message-ID: <26551.1505419780@turing-police.cc.vt.edu> On Thu, 14 Sep 2017 14:55:39 -0400, "Marc A Kaplan" said: > Read the doc again. Specify both -g and -N options on the command line to > get fully parallel directory and inode/policy scanning. Yeah, figured that out, with help from somebody. :) > I'm curious as to what you're trying to do with THRESHOLD(0,100,0) ... > Perhaps premigrate everything (that matches the other conditions)? Yeah, it's actually feeding to LTFS/EE - where we premigrate everything that matches to tape. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From makaplan at us.ibm.com Thu Sep 14 22:13:59 2017 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 14 Sep 2017 17:13:59 -0400 Subject: [gpfsug-discuss] mmapplypolicy run time weirdness.. In-Reply-To: <26551.1505419780@turing-police.cc.vt.edu> References: <52657.1505348331@turing-police.cc.vt.edu> <26551.1505419780@turing-police.cc.vt.edu> Message-ID: BTW - we realize that mmapplypolicy -g and -N is a "gotcha" for some (many?) customer/admins -- so we're considering ways to make that easier -- but without "breaking" scripts and callbacks and what-have-yous that might depend on the current/old defaults... Always a balancing act -- considering that GPFS ne Spectrum Scale just hit its 20th birthday (by IBM reckoning) --marc of GPFS -------------- next part -------------- An HTML attachment was scrubbed... URL: From neil.wilson at metoffice.gov.uk Fri Sep 15 11:47:19 2017 From: neil.wilson at metoffice.gov.uk (Wilson, Neil) Date: Fri, 15 Sep 2017 10:47:19 +0000 Subject: [gpfsug-discuss] ZIMON Sensors config files... Message-ID: Hi, Does anyone know how to use "mmperfmon config update" to get the "hostname =" field in the ZImonSensors.cfg file populated with the hostname of the node that it's been installed on? By default the field is empty and for some reason on our cluster it doesn't transmit any metrics unless we put the node hostname into that field. Is there some kind of wildcard that I can set? Thanks Neil Neil Wilson Senior IT Practitioner Storage Team IT Services Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom Tel: +44 (0)1392 885959 Email: neil.wilson at metoffice.gov.uk Website www.metoffice.gov.uk Our magazine Barometer is now available online at http://www.metoffice.gov.uk/barometer/ P Please consider the environment before printing this e-mail. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.hearns at asml.com Fri Sep 15 16:37:13 2017 From: john.hearns at asml.com (John Hearns) Date: Fri, 15 Sep 2017 15:37:13 +0000 Subject: [gpfsug-discuss] NFS re-export with nfs-ganesha proxy? Message-ID: This is very probably off topic here.. I would be happy to get any responses off list. My question is has anyone here set up NFS re-export / proxy with nfs-ganesha? John Hearns -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Greg.Lehmann at csiro.au Mon Sep 18 01:14:52 2017 From: Greg.Lehmann at csiro.au (Greg.Lehmann at csiro.au) Date: Mon, 18 Sep 2017 00:14:52 +0000 Subject: [gpfsug-discuss] NFS re-export with nfs-ganesha proxy? In-Reply-To: References: Message-ID: <5d1811f4d6ad4605bd2a7c7441f4dd1b@exch1-cdc.nexus.csiro.au> I am interested too, so maybe keep it on list? From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of John Hearns Sent: Saturday, 16 September 2017 1:37 AM To: gpfsug main discussion list Subject: [gpfsug-discuss] NFS re-export with nfs-ganesha proxy? This is very probably off topic here.. I would be happy to get any responses off list. My question is has anyone here set up NFS re-export / proxy with nfs-ganesha? John Hearns -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.lefebvre+gpfsug at calculquebec.ca Mon Sep 18 20:16:57 2017 From: richard.lefebvre+gpfsug at calculquebec.ca (Richard Lefebvre) Date: Mon, 18 Sep 2017 15:16:57 -0400 Subject: [gpfsug-discuss] How to find which node is generating high iops in a GPFS 3.5 Message-ID: Hi I have a 3.5 GPFS system with 700+ nodes. I sometime have nodes that generate a lot of iops on the large file system but I cannot find the right tool to find which node is the source. I'm guessing under 4.2.X, there are now easy tools, but what can be done under GPFS 3.5. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Mon Sep 18 20:27:49 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 18 Sep 2017 19:27:49 +0000 Subject: [gpfsug-discuss] How to find which node is generating high iops in a GPFS 3.5 Message-ID: <39FB5D56-A8C4-47DA-8A56-A2E453724875@nuance.com> You do realize 3.5 is out of service, correct? You should be looking at upgrading :-) Catching this is real time, when you have a large number of nodes is going to be tough. How you recognizing that the file system is overloaded? Waiters? Looking at which nodes/NSDs have the longest/largest waiters may provide a clue. You might also take a look at mmpmon ? it?s a bit difficult to use in its raw state, but it does provide some good stats on a per file system basis. But you need to track these over times to get what you need. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Richard Lefebvre Reply-To: gpfsug main discussion list Date: Monday, September 18, 2017 at 2:18 PM To: gpfsug Subject: [EXTERNAL] [gpfsug-discuss] How to find which node is generating high iops in a GPFS 3.5 Hi I have a 3.5 GPFS system with 700+ nodes. I sometime have nodes that generate a lot of iops on the large file system but I cannot find the right tool to find which node is the source. I'm guessing under 4.2.X, there are now easy tools, but what can be done under GPFS 3.5. -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Tue Sep 19 07:47:42 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Tue, 19 Sep 2017 14:47:42 +0800 Subject: [gpfsug-discuss] ZIMON Sensors config files... In-Reply-To: References: Message-ID: Hi Neil, Have you tried these steps? mmperfmon config show --config-file /tmp/a vi /tmp/a mmperfmon config update --collectors oc8757286465 --config-file /tmp/a mmperfmon config show Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Wilson, Neil" To: gpfsug main discussion list Date: 09/15/2017 06:48 PM Subject: [gpfsug-discuss] ZIMON Sensors config files... Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, Does anyone know how to use ?mmperfmon config update? to get the ?hostname =? field in the ZImonSensors.cfg file populated with the hostname of the node that it?s been installed on? By default the field is empty and for some reason on our cluster it doesn?t transmit any metrics unless we put the node hostname into that field. Is there some kind of wildcard that I can set? Thanks Neil Neil Wilson Senior IT Practitioner Storage Team IT Services Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom Tel: +44 (0)1392 885959 Email: neil.wilson at metoffice.gov.uk Website www.metoffice.gov.uk Our magazine Barometer is now available online at http://www.metoffice.gov.uk/barometer/ P Please consider the environment before printing this e-mail. Thank you. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=JJA1q39zaRyjClihY50646c-CyY4ZvrmpSjR1qs5rTc&s=GWOiCpEHiZ_TqlFj0AeKmjcccnez-X2rHMa5UtvGPTk&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From scale at us.ibm.com Tue Sep 19 07:54:50 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Tue, 19 Sep 2017 14:54:50 +0800 Subject: [gpfsug-discuss] How to find which node is generating high iops in a GPFS 3.5 In-Reply-To: <39FB5D56-A8C4-47DA-8A56-A2E453724875@nuance.com> References: <39FB5D56-A8C4-47DA-8A56-A2E453724875@nuance.com> Message-ID: Hi Richard, Is any of tool in https://www.ibm.com/developerworks/community/wikis/home?_escaped_fragment_=/wiki/General%2520Parallel%2520File%2520System%2520%2528GPFS%2529/page/Display%2520per%2520node%2520IO%2520statstics can help you? BTW, I agree with Bob that 3.5 is out-of-service. Without an extended service, you should consider to upgrade your cluster as soon as possible. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 09/19/2017 03:28 AM Subject: Re: [gpfsug-discuss] How to find which node is generating high iops in a GPFS 3.5 Sent by: gpfsug-discuss-bounces at spectrumscale.org You do realize 3.5 is out of service, correct? You should be looking at upgrading :-) Catching this is real time, when you have a large number of nodes is going to be tough. How you recognizing that the file system is overloaded? Waiters? Looking at which nodes/NSDs have the longest/largest waiters may provide a clue. You might also take a look at mmpmon ? it?s a bit difficult to use in its raw state, but it does provide some good stats on a per file system basis. But you need to track these over times to get what you need. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Richard Lefebvre Reply-To: gpfsug main discussion list Date: Monday, September 18, 2017 at 2:18 PM To: gpfsug Subject: [EXTERNAL] [gpfsug-discuss] How to find which node is generating high iops in a GPFS 3.5 Hi I have a 3.5 GPFS system with 700+ nodes. I sometime have nodes that generate a lot of iops on the large file system but I cannot find the right tool to find which node is the source. I'm guessing under 4.2.X, there are now easy tools, but what can be done under GPFS 3.5. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=AYwUf61wv-Hq63KU7veQSxavdZy-e9eT9bkJFav8MVU&s=W42AQE74bvmOlw7P0D0wTqT0Rxop4KktnXeuDeGGdmk&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From rohwedder at de.ibm.com Tue Sep 19 08:42:46 2017 From: rohwedder at de.ibm.com (Markus Rohwedder) Date: Tue, 19 Sep 2017 09:42:46 +0200 Subject: [gpfsug-discuss] ZIMON Sensors config files... In-Reply-To: References: Message-ID: Hello Neil, While the description below provides a way on how to edit the hostname parameter, you should not have the need to edit the "hostname" parameter. Sensors use the hostname() call to get the hostname where the sensor is running and use this as key in the performance database, which is what you typically want to see. From the description you provide I assume you want to have a sensor running on every node that has the perfmon designation? There could be different issues: > In order to enable sensors on every node, you need to ensure there is no "restrict" clause in the sensor description, or the restrict clause has to be set correctly > There could be some other communication issue between sensors and collectors. Restart sensors and collectors and check the logfiles in /var/log/zimon/. You should be able to see which sensors start up and if they can connect. > Can you check if you have the perfmon designation set for the nodes where you expect data from (mmlscluster) Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development Phone: +49 7034 6430190 IBM Deutschland E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany IBM Deutschland Research & Development GmbH / Vorsitzender des Aufsichtsrats: Martina K?deritz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "IBM Spectrum Scale" To: gpfsug main discussion list Cc: gpfsug-discuss-bounces at spectrumscale.org Date: 09/19/2017 08:48 AM Subject: Re: [gpfsug-discuss] ZIMON Sensors config files... Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Neil, Have you tried these steps? mmperfmon config show --config-file /tmp/a vi /tmp/a mmperfmon config update --collectors oc8757286465 --config-file /tmp/a mmperfmon config show Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. Inactive hide details for "Wilson, Neil" ---09/15/2017 06:48:26 PM---Hi, Does anyone know how to use "mmperfmon config update" "Wilson, Neil" ---09/15/2017 06:48:26 PM---Hi, Does anyone know how to use "mmperfmon config update" to get the "hostname =" field in the ZImon From: "Wilson, Neil" To: gpfsug main discussion list Date: 09/15/2017 06:48 PM Subject: [gpfsug-discuss] ZIMON Sensors config files... Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, Does anyone know how to use ?mmperfmon config update? to get the ?hostname =? field in the ZImonSensors.cfg file populated with the hostname of the node that it?s been installed on? By default the field is empty and for some reason on our cluster it doesn?t transmit any metrics unless we put the node hostname into that field. Is there some kind of wildcard that I can set? Thanks Neil Neil Wilson Senior IT Practitioner Storage Team IT Services Met Office FitzRoy Road Exeter Devon EX1 3PB United Kingdom Tel: +44 (0)1392 885959 Email: neil.wilson at metoffice.gov.uk Website www.metoffice.gov.uk Our magazine Barometer is now available online at http://www.metoffice.gov.uk/barometer/ P Please consider the environment before printing this e-mail. Thank you. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=JJA1q39zaRyjClihY50646c-CyY4ZvrmpSjR1qs5rTc&s=GWOiCpEHiZ_TqlFj0AeKmjcccnez-X2rHMa5UtvGPTk&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=Ow2bpnoab1kboH2xuSUrbx65ALeoAAicG7csl1sV-Qc&s=qZ1XUXWfOayLSSuvcCyHQ2ZgY1mu0Zs3kmpgeVQUCYI&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1D696444.gif Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From mnaineni at in.ibm.com Tue Sep 19 12:50:50 2017 From: mnaineni at in.ibm.com (Malahal R Naineni) Date: Tue, 19 Sep 2017 11:50:50 +0000 Subject: [gpfsug-discuss] NFS re-export with nfs-ganesha proxy? (Greg.Lehmann@csiro.au) Message-ID: An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Tue Sep 19 22:02:03 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Tue, 19 Sep 2017 21:02:03 +0000 Subject: [gpfsug-discuss] CCR cluster down for the count? Message-ID: <26ABC473-387D-4D58-9059-518E455724A9@vanderbilt.edu> Hi All, We have a small test cluster that is CCR enabled. It only had/has 3 NSD servers (testnsd1, 2, and 3) and maybe 3-6 clients. testnsd3 died a while back. I did nothing about it at the time because it was due to be life-cycled as soon as I finished a couple of higher priority projects. Yesterday, testnsd1 also died, which took the whole cluster down. So now resolving this has become higher priority? ;-) I took two other boxes and set them up as testnsd1 and 3, respectively. I?ve done a ?mmsdrrestore -p testnsd2 -R /usr/bin/scp? on both of them. I?ve also done a "mmccr setup -F? and copied the ccr.disks and ccr.nodes files from testnsd2 to them. And I?ve copied /var/mmfs/gen/mmsdrfs from testnsd2 to testnsd1 and 3. In case it?s not obvious from the above, networking is fine ? ssh without a password between those 3 boxes is fine. However, when I try to startup GPFS ? or run any GPFS command I get: /root root at testnsd2# mmstartup -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous error messages to determine cause. /root root at testnsd2# I?ve got to run to a meeting right now, so I hope I?m not leaving out any crucial details here ? does anyone have an idea what I need to do? Thanks? ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Sep 20 00:39:37 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 19 Sep 2017 23:39:37 +0000 Subject: [gpfsug-discuss] CCR cluster down for the count? Message-ID: OK ? I?ve run across this before, and it?s because of a bug (as I recall) having to do with CCR and quorum. What I think you can do is set the cluster to non-ccr (mmchcluster ?ccr-disable) with all the nodes down, bring it back up and then re-enable ccr. I?ll see if I can find this in one of the recent 4.2 release nodes. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of "Buterbaugh, Kevin L" Reply-To: gpfsug main discussion list Date: Tuesday, September 19, 2017 at 4:03 PM To: gpfsug main discussion list Subject: [EXTERNAL] [gpfsug-discuss] CCR cluster down for the count? Hi All, We have a small test cluster that is CCR enabled. It only had/has 3 NSD servers (testnsd1, 2, and 3) and maybe 3-6 clients. testnsd3 died a while back. I did nothing about it at the time because it was due to be life-cycled as soon as I finished a couple of higher priority projects. Yesterday, testnsd1 also died, which took the whole cluster down. So now resolving this has become higher priority? ;-) I took two other boxes and set them up as testnsd1 and 3, respectively. I?ve done a ?mmsdrrestore -p testnsd2 -R /usr/bin/scp? on both of them. I?ve also done a "mmccr setup -F? and copied the ccr.disks and ccr.nodes files from testnsd2 to them. And I?ve copied /var/mmfs/gen/mmsdrfs from testnsd2 to testnsd1 and 3. In case it?s not obvious from the above, networking is fine ? ssh without a password between those 3 boxes is fine. However, when I try to startup GPFS ? or run any GPFS command I get: /root root at testnsd2# mmstartup -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous error messages to determine cause. /root root at testnsd2# I?ve got to run to a meeting right now, so I hope I?m not leaving out any crucial details here ? does anyone have an idea what I need to do? Thanks? ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bevans at pixitmedia.com Wed Sep 20 02:21:36 2017 From: bevans at pixitmedia.com (Barry Evans) Date: Tue, 19 Sep 2017 18:21:36 -0700 Subject: [gpfsug-discuss] RoCE not playing ball Message-ID: Hi All, Weirdness with a RoCE interface - verbs is not playing ball and is complaining about the inet6 address not matching up: 2017-09-02_07:46:01.376+0100: [I] VERBS RDMA starting with verbsRdmaCm=yes verbsRdmaSend=no verbsRdmaUseMultiCqThreads=yes verbsRdmaUseCompVectors=yes 2017-09-02_07:46:01.377+0100: [I] VERBS RDMA library librdmacm.so (version >= 1.1) loaded and initialized. 2017-09-02_07:46:01.377+0100: [I] VERBS RDMA verbsRdmasPerNode reduced from 1000 to 514 to match (nsdMaxWorkerThreads 512 + (nspdThreadsPerQueue 2 * nspdQueues 1)). 2017-09-02_07:46:01.382+0100: [I] VERBS RDMA discover mlx4_1 port 1 transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet 0xFE80000000000000 id 0x268A07FFFEF981C0 state ACTIVE 2017-09-02_07:46:01.383+0100: [I] VERBS RDMA discover mlx4_1 port 1 transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[1] subnet 0x0000000000000000 id 0x0000FFFFAC106404 state ACTIVE 2017-09-02_07:46:01.384+0100: [I] VERBS RDMA discover mlx4_1 port 2 transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet 0xFE80000000000000 id 0x248A070001F981E1 state DOWN 2017-09-02_07:46:01.385+0100: [I] VERBS RDMA discover mlx4_0 port 1 transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet 0xFE80000000000000 id 0x268A07FFFEF981C0 state ACTIVE 2017-09-02_07:46:01.385+0100: [I] VERBS RDMA discover mlx4_0 port 1 transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[1] subnet 0x0000000000000000 id 0x0000FFFFAC106404 state ACTIVE 2017-09-02_07:46:01.386+0100: [I] VERBS RDMA discover mlx4_0 port 2 transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet 0xFE80000000000000 id 0x248A070001F981C1 state ACTIVE 2017-09-02_07:46:01.386+0100: [I] VERBS RDMA discover mlx4_0 port 2 transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[1] subnet 0x0000000000000000 id 0x0000FFFF0AC20011 state ACTIVE 2017-09-02_07:46:01.387+0100: [I] VERBS RDMA parse verbsPorts mlx4_0/1 2017-09-02_07:46:01.390+0100: [W] VERBS RDMA parse error verbsPort mlx4_0/1 ignored due to interface not found for port 1 of device mlx4_0 with GID c081f9feff078a26. Please check if the correct inet6 address for the corresponding IP network interface is set 2017-09-02_07:46:01.390+0100: [E] VERBS RDMA: rdma_get_cm_event err -1 2017-09-02_07:46:01.391+0100: [I] VERBS RDMA library librdmacm.so unloaded. 2017-09-02_07:46:01.391+0100: [E] VERBS RDMA failed to start, no valid verbsPorts defined. Anyone run into this before? I have another node imaged the *exact* same way and no dice. Have tried a variety of drivers, cards, etc, same result every time. Cheers, Barry -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From scale at us.ibm.com Wed Sep 20 04:07:18 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Wed, 20 Sep 2017 11:07:18 +0800 Subject: [gpfsug-discuss] CCR cluster down for the count? In-Reply-To: References: Message-ID: Hi Kevin, Let's me try to understand the problem you have. What's the meaning of node died here. Are you mean that there are some hardware/OS issue which cannot be fixed and OS cannot be up anymore? I agree with Bob that you can have a try to disable CCR temporally, restore cluster configuration and enable it again. Such as: 1. Login to a node which has proper GPFS config, e.g NodeA 2. Shutdown daemon in all client cluster. 3. mmchcluster --ccr-disable -p NodeA 4. mmsdrrestore -a -p NodeA 5. mmauth genkey propagate -N testnsd1, testnsd3 6. mmchcluster --ccr-enable Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 09/20/2017 07:39 AM Subject: Re: [gpfsug-discuss] CCR cluster down for the count? Sent by: gpfsug-discuss-bounces at spectrumscale.org OK ? I?ve run across this before, and it?s because of a bug (as I recall) having to do with CCR and quorum. What I think you can do is set the cluster to non-ccr (mmchcluster ?ccr-disable) with all the nodes down, bring it back up and then re-enable ccr. I?ll see if I can find this in one of the recent 4.2 release nodes. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of "Buterbaugh, Kevin L" Reply-To: gpfsug main discussion list Date: Tuesday, September 19, 2017 at 4:03 PM To: gpfsug main discussion list Subject: [EXTERNAL] [gpfsug-discuss] CCR cluster down for the count? Hi All, We have a small test cluster that is CCR enabled. It only had/has 3 NSD servers (testnsd1, 2, and 3) and maybe 3-6 clients. testnsd3 died a while back. I did nothing about it at the time because it was due to be life-cycled as soon as I finished a couple of higher priority projects. Yesterday, testnsd1 also died, which took the whole cluster down. So now resolving this has become higher priority? ;-) I took two other boxes and set them up as testnsd1 and 3, respectively. I?ve done a ?mmsdrrestore -p testnsd2 -R /usr/bin/scp? on both of them. I?ve also done a "mmccr setup -F? and copied the ccr.disks and ccr.nodes files from testnsd2 to them. And I?ve copied /var/mmfs/gen/mmsdrfs from testnsd2 to testnsd1 and 3. In case it?s not obvious from the above, networking is fine ? ssh without a password between those 3 boxes is fine. However, when I try to startup GPFS ? or run any GPFS command I get: /root root at testnsd2# mmstartup -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous error messages to determine cause. /root root at testnsd2# I?ve got to run to a meeting right now, so I hope I?m not leaving out any crucial details here ? does anyone have an idea what I need to do? Thanks? ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=mBSa534LB4C2zN59ZsJSlginQqfcrutinpAPYNDqU_Y&s=YJEapknqzE2d9kwZzZuu6gEW0DzBoM-o94pXGEeCfuI&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From scale at us.ibm.com Wed Sep 20 04:33:16 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Wed, 20 Sep 2017 11:33:16 +0800 Subject: [gpfsug-discuss] Disk change problem in gss GNR In-Reply-To: References: Message-ID: Hi Atmane, In terms of this kind of disk management question, I would like to suggest to open a PMR to make IBM service help you. mmdelpdisk command would not need to reboot system to take effect. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: atmane To: "gpfsug-discuss at spectrumscale.org" Date: 09/14/2017 08:50 PM Subject: [gpfsug-discuss] Disk change problem in gss GNR Sent by: gpfsug-discuss-bounces at spectrumscale.org dear all, I change A Disk In Gss Storage Server mmchcarrier BB1RGL --release --pdisk 'e1d1s02' mmchcarrier BB1RGL --replace --pdisk 'e1d1s02' after replace disk Now I Have 2 Discs In My Gss the first disc was well changed name = "e1d1s02" the second disk still after I use this cmd mmdelpdisk BB1RGL --pdisk e1d1s02#004 -a the disk is still in use i need to reboot the system or ?? mmlspdisk all | less pdisk: replacementPriority = 1000 name = "e1d1s02" device = "/dev/sdik,/dev/sdih" recoveryGroup = "BB1RGL" declusteredArray = "DA1" state = "ok" capacity = 3000034656256 freeSpace = 1453846429696 fru = "00W1572" location = "SV30820390-1-2" WWN = "naa.5000C5008D783E37" server = "gss0-ib0" pdisk: replacementPriority = 1000 name = "e1d1s02#004" device = "" recoveryGroup = "BB1RGL" declusteredArray = "DA1" state = "missing/noPath/systemDrain/adminDrain/noRGD/noVCD" capacity = 3000034656256 freeSpace = 1599875317760 fru = "00W1572" location = "" WWN = "naa.5000C50056714E83" server = "gss0-ib0" -- -- Atmane Khiredine HPC System Admin | Office National de la M?t?orologie T?l : +213 21 50 73 93 Poste 303 | Fax : +213 21 50 79 40 | E-mail : a.khiredine at meteo.dz _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIFbA&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=hQ86ctTaI7i14NrB-58_SzqSWnCR8p6b5bFxtzNcSbk&s=mthjH7ebhnNlSJl71hFjF4wZU0iygm3I9wH_Bu7_3Ds&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From olaf.weiser at de.ibm.com Wed Sep 20 06:00:49 2017 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Wed, 20 Sep 2017 07:00:49 +0200 Subject: [gpfsug-discuss] RoCE not playing ball In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From jonathon.anderson at colorado.edu Wed Sep 20 06:13:13 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Wed, 20 Sep 2017 05:13:13 +0000 Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication In-Reply-To: References: <27469.1500914134@turing-police.cc.vt.edu> , Message-ID: Returning to this thread cause I'm having the same issue as Ilan, above. I'm working on setting up CES in our environment after finally getting a blocking bugfix applied. I'm making it further now, but I'm getting an error when I try to create my export: --- [root at sgate2 ~]# mmnfs export add /gpfs/summit/scratch --client 'login*.rc.int.colorado.edu(rw,root_squash);dtn*.rc.int.colorado.edu(rw,root_squash)' mmcesfuncs.sh: Current authentication: none is invalid. This operation can not be completed without correct Authentication configuration. Configure authentication using: mmuserauth mmnfs export add: Command failed. Examine previous error messages to determine cause. --- When I try to configure mmuserauth, I get an error about not having SMB active; but I don't want to configure SMB, only NFS. --- [root at sgate2 ~]# /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. --- How can I configure NFS exports with mmnfs without having to enable SMB? ~jonathon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Varun Mittal3 Sent: Tuesday, July 25, 2017 9:44:24 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sorry a small typo: mmuserauth service create --data-access-method file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune [Inactive hide details for Varun Mittal3---26/07/2017 09:12:27 AM---Hi Did you try to run this command from a CES designated nod]Varun Mittal3---26/07/2017 09:12:27 AM---Hi Did you try to run this command from a CES designated node ? From: Varun Mittal3/India/IBM To: gpfsug main discussion list Date: 26/07/2017 09:12 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication ________________________________ Hi Did you try to run this command from a CES designated node ? If no, then try executing the command from a CES node: mmuserauth service create --data-access-type file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune [Inactive hide details for Ilan Schwarts ---25/07/2017 10:22:26 AM---Hi, While trying to add the userdefined auth, I receive err]Ilan Schwarts ---25/07/2017 10:22:26 AM---Hi, While trying to add the userdefined auth, I receive error that SMB From: Ilan Schwarts To: gpfsug main discussion list Date: 25/07/2017 10:22 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi, While trying to add the userdefined auth, I receive error that SMB service not enabled. I am currently working on a spectrum scale cluster, and i dont have the SMB package, I am waiting for it.. is there a way to export NFSv3 using the spectrum scale tools without SMB package ? [root at LH20-GPFS1 ~]# mmuserauth service create --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. I exported the NFS via /etc/exports and than ./exportfs -a .. It works fine, I was able to mount the gpfs export from another machine.. this was my work-around since the spectrum scale tools failed to export NFSv3 On Mon, Jul 24, 2017 at 7:35 PM, wrote: > On Mon, 24 Jul 2017 13:36:41 +0300, Ilan Schwarts said: >> Hi, >> I have gpfs with 2 Nodes (redhat). >> I am trying to create NFS share - So I would be able to mount and >> access it from another linux machine. > >> While trying to create NFS (I execute the following): >> [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "* >> Access_Type=RW,Protocols=3:4,Squash=no_root_squash)" > > You can get away with little to no authentication for NFSv3, but > not for NFSv4. Try with Protocols=3 only and > > mmuserauth service create --type userdefined > > that should get you Unix-y NFSv3 UID/GID support and "trust what the NFS > client tells you". This of course only works sanely if each NFS export is > only to a set of machines in the same administrative domain that manages their > UID/GIDs. Exporting to two sets of machines that don't coordinate their > UID/GID space is, of course, where hilarity and hijinks ensue.... > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- - Ilan Schwarts _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From jonathon.anderson at colorado.edu Wed Sep 20 06:33:14 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Wed, 20 Sep 2017 05:33:14 +0000 Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication In-Reply-To: References: <27469.1500914134@turing-police.cc.vt.edu> , , Message-ID: I should have said, here are the package versions: [root at sgate1 ~]# rpm -qa | grep gpfs gpfs.gpl-4.2.2-3.noarch gpfs.docs-4.2.2-3.noarch gpfs.base-4.2.2-3.x86_64 gpfs.gplbin-3.10.0-514.26.2.el7.x86_64-4.2.2-3.x86_64 nfs-ganesha-gpfs-2.3.2-0.ibm32_2.el7.x86_64 gpfs.ext-4.2.2-3.x86_64 gpfs.msg.en_US-4.2.2-3.noarch gpfs.gskit-8.0.50-57.x86_64 gpfs.gplbin-3.10.0-327.36.3.el7.x86_64-4.2.2-3.x86_64 ________________________________________ From: Jonathon A Anderson Sent: Tuesday, September 19, 2017 11:13:13 PM To: gpfsug main discussion list Cc: varun.mittal at in.ibm.com; Mark.Bush at siriuscom.com Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Returning to this thread cause I'm having the same issue as Ilan, above. I'm working on setting up CES in our environment after finally getting a blocking bugfix applied. I'm making it further now, but I'm getting an error when I try to create my export: --- [root at sgate2 ~]# mmnfs export add /gpfs/summit/scratch --client 'login*.rc.int.colorado.edu(rw,root_squash);dtn*.rc.int.colorado.edu(rw,root_squash)' mmcesfuncs.sh: Current authentication: none is invalid. This operation can not be completed without correct Authentication configuration. Configure authentication using: mmuserauth mmnfs export add: Command failed. Examine previous error messages to determine cause. --- When I try to configure mmuserauth, I get an error about not having SMB active; but I don't want to configure SMB, only NFS. --- [root at sgate2 ~]# /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. --- How can I configure NFS exports with mmnfs without having to enable SMB? ~jonathon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Varun Mittal3 Sent: Tuesday, July 25, 2017 9:44:24 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sorry a small typo: mmuserauth service create --data-access-method file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune [Inactive hide details for Varun Mittal3---26/07/2017 09:12:27 AM---Hi Did you try to run this command from a CES designated nod]Varun Mittal3---26/07/2017 09:12:27 AM---Hi Did you try to run this command from a CES designated node ? From: Varun Mittal3/India/IBM To: gpfsug main discussion list Date: 26/07/2017 09:12 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication ________________________________ Hi Did you try to run this command from a CES designated node ? If no, then try executing the command from a CES node: mmuserauth service create --data-access-type file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune [Inactive hide details for Ilan Schwarts ---25/07/2017 10:22:26 AM---Hi, While trying to add the userdefined auth, I receive err]Ilan Schwarts ---25/07/2017 10:22:26 AM---Hi, While trying to add the userdefined auth, I receive error that SMB From: Ilan Schwarts To: gpfsug main discussion list Date: 25/07/2017 10:22 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi, While trying to add the userdefined auth, I receive error that SMB service not enabled. I am currently working on a spectrum scale cluster, and i dont have the SMB package, I am waiting for it.. is there a way to export NFSv3 using the spectrum scale tools without SMB package ? [root at LH20-GPFS1 ~]# mmuserauth service create --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. I exported the NFS via /etc/exports and than ./exportfs -a .. It works fine, I was able to mount the gpfs export from another machine.. this was my work-around since the spectrum scale tools failed to export NFSv3 On Mon, Jul 24, 2017 at 7:35 PM, wrote: > On Mon, 24 Jul 2017 13:36:41 +0300, Ilan Schwarts said: >> Hi, >> I have gpfs with 2 Nodes (redhat). >> I am trying to create NFS share - So I would be able to mount and >> access it from another linux machine. > >> While trying to create NFS (I execute the following): >> [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "* >> Access_Type=RW,Protocols=3:4,Squash=no_root_squash)" > > You can get away with little to no authentication for NFSv3, but > not for NFSv4. Try with Protocols=3 only and > > mmuserauth service create --type userdefined > > that should get you Unix-y NFSv3 UID/GID support and "trust what the NFS > client tells you". This of course only works sanely if each NFS export is > only to a set of machines in the same administrative domain that manages their > UID/GIDs. Exporting to two sets of machines that don't coordinate their > UID/GID space is, of course, where hilarity and hijinks ensue.... > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- - Ilan Schwarts _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From gangqiu at cn.ibm.com Wed Sep 20 06:58:15 2017 From: gangqiu at cn.ibm.com (Gang Qiu) Date: Wed, 20 Sep 2017 13:58:15 +0800 Subject: [gpfsug-discuss] RoCE not playing ball In-Reply-To: References: Message-ID: Do you set ip address for these adapters? Refer to the description of verbsRdmaCm in ?Command and Programming Reference': If RDMA CM is enabled for a node, the node will only be able to establish RDMA connections using RDMA CM to other nodes with verbsRdmaCm enabled. RDMA CM enablement requires IPoIB (IP over InfiniBand) with an active IP address for each port. Although IPv6 must be enabled, the GPFS implementation of RDMA CM does not currently support IPv6 addresses, so an IPv4 address must be used. Regards, Gang Qiu ********************************************************************************************** IBM China Systems & Technology Lab Tel: 86-10-82452193 Fax: 86-10-82452312 Moble: 132-6134-8284 Email: gangqiu at cn.ibm.com Address: Ring Bldg. No.28 Building, Zhong Guan Cun Software Park, No. 8 Dong Bei Wang West Road, ShangDi, Haidian District, Beijing 100193, P.R.China ??????????????8???????28???????????100193 ********************************************************************************************** From: "Olaf Weiser" To: gpfsug main discussion list Date: 09/20/2017 01:01 PM Subject: Re: [gpfsug-discuss] RoCE not playing ball Sent by: gpfsug-discuss-bounces at spectrumscale.org is ib_read_bw working ? just test it between the two nodes ... From: Barry Evans To: gpfsug main discussion list Date: 09/20/2017 03:21 AM Subject: [gpfsug-discuss] RoCE not playing ball Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, Weirdness with a RoCE interface - verbs is not playing ball and is complaining about the inet6 address not matching up: 2017-09-02_07:46:01.376+0100: [I] VERBS RDMA starting with verbsRdmaCm=yes verbsRdmaSend=no verbsRdmaUseMultiCqThreads=yes verbsRdmaUseCompVectors=yes 2017-09-02_07:46:01.377+0100: [I] VERBS RDMA library librdmacm.so (version >= 1.1) loaded and initialized. 2017-09-02_07:46:01.377+0100: [I] VERBS RDMA verbsRdmasPerNode reduced from 1000 to 514 to match (nsdMaxWorkerThreads 512 + (nspdThreadsPerQueue 2 * nspdQueues 1)). 2017-09-02_07:46:01.382+0100: [I] VERBS RDMA discover mlx4_1 port 1 transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet 0xFE80000000000000 id 0x268A07FFFEF981C0 state ACTIVE 2017-09-02_07:46:01.383+0100: [I] VERBS RDMA discover mlx4_1 port 1 transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[1] subnet 0x0000000000000000 id 0x0000FFFFAC106404 state ACTIVE 2017-09-02_07:46:01.384+0100: [I] VERBS RDMA discover mlx4_1 port 2 transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet 0xFE80000000000000 id 0x248A070001F981E1 state DOWN 2017-09-02_07:46:01.385+0100: [I] VERBS RDMA discover mlx4_0 port 1 transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet 0xFE80000000000000 id 0x268A07FFFEF981C0 state ACTIVE 2017-09-02_07:46:01.385+0100: [I] VERBS RDMA discover mlx4_0 port 1 transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[1] subnet 0x0000000000000000 id 0x0000FFFFAC106404 state ACTIVE 2017-09-02_07:46:01.386+0100: [I] VERBS RDMA discover mlx4_0 port 2 transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet 0xFE80000000000000 id 0x248A070001F981C1 state ACTIVE 2017-09-02_07:46:01.386+0100: [I] VERBS RDMA discover mlx4_0 port 2 transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[1] subnet 0x0000000000000000 id 0x0000FFFF0AC20011 state ACTIVE 2017-09-02_07:46:01.387+0100: [I] VERBS RDMA parse verbsPorts mlx4_0/1 2017-09-02_07:46:01.390+0100: [W] VERBS RDMA parse error verbsPort mlx4_0/1 ignored due to interface not found for port 1 of device mlx4_0 with GID c081f9feff078a26. Please check if the correct inet6 address for the corresponding IP network interface is set 2017-09-02_07:46:01.390+0100: [E] VERBS RDMA: rdma_get_cm_event err -1 2017-09-02_07:46:01.391+0100: [I] VERBS RDMA library librdmacm.so unloaded. 2017-09-02_07:46:01.391+0100: [E] VERBS RDMA failed to start, no valid verbsPorts defined. Anyone run into this before? I have another node imaged the *exact* same way and no dice. Have tried a variety of drivers, cards, etc, same result every time. Cheers, Barry This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email._______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=NCthMXTjizwdEVDBqoDwAfRswiFbdQVHRb4mzseFLEM&m=u155tVFn5u91gqIsTXSOSVvpbR7GQRPoVpviUDH73R0&s=63nY5ozD8mej1jefNBZjLGCkNOFD9-swr-lc7CRPbrM&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From tortay at cc.in2p3.fr Wed Sep 20 09:03:54 2017 From: tortay at cc.in2p3.fr (Loic Tortay) Date: Wed, 20 Sep 2017 10:03:54 +0200 Subject: [gpfsug-discuss] CCR cluster down for the count? In-Reply-To: <26ABC473-387D-4D58-9059-518E455724A9@vanderbilt.edu> References: <26ABC473-387D-4D58-9059-518E455724A9@vanderbilt.edu> Message-ID: <853ffcf7-7900-457b-0d8a-2c63886ed245@cc.in2p3.fr> On 19/09/2017 23:02, Buterbaugh, Kevin L wrote: > Hi All, > > We have a small test cluster that is CCR enabled. It only had/has 3 NSD servers (testnsd1, 2, and 3) and maybe 3-6 clients. testnsd3 died a while back. I did nothing about it at the time because it was due to be life-cycled as soon as I finished a couple of higher priority projects. > > Yesterday, testnsd1 also died, which took the whole cluster down. So now resolving this has become higher priority? ;-) > > I took two other boxes and set them up as testnsd1 and 3, respectively. I?ve done a ?mmsdrrestore -p testnsd2 -R /usr/bin/scp? on both of them. I?ve also done a "mmccr setup -F? and copied the ccr.disks and ccr.nodes files from testnsd2 to them. And I?ve copied /var/mmfs/gen/mmsdrfs from testnsd2 to testnsd1 and 3. In case it?s not obvious from the above, networking is fine ? ssh without a password between those 3 boxes is fine. > > However, when I try to startup GPFS ? or run any GPFS command I get: > > /root > root at testnsd2# mmstartup -a > get file failed: Not enough CCR quorum nodes available (err 809) > gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 > mmstartup: Command failed. Examine previous error messages to determine cause. > /root > root at testnsd2# > > I?ve got to run to a meeting right now, so I hope I?m not leaving out any crucial details here ? does anyone have an idea what I need to do? Thanks? > Hello, I have had the same issue multiple times. The "trick" is to execute "/usr/lpp/mmfs/bin/mmcommon startCcrMonitor" on a majority of quorum nodes (once they have the correct configuration files) to be able to start the cluster. I noticed a call to the above command in the "gpfs.gplbin" spec file in the "%postun" section (when doing RPM upgrades, if I'm not mistaken). . Lo?c. -- | Lo?c Tortay - IN2P3 Computing Centre | From r.sobey at imperial.ac.uk Wed Sep 20 09:23:37 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 20 Sep 2017 08:23:37 +0000 Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication In-Reply-To: References: <27469.1500914134@turing-police.cc.vt.edu> , Message-ID: This sounded familiar to a problem I had to do with SMB and NFS. I've looked, and it's a different problem, but at the time I had this response. "That would be the case when Active Directory is configured for authentication. In that case the SMB service includes two aspects: One is the actual SMB file server, and the second one is the service for the Active Directory integration. Since NFS depends on authentication and id mapping services, it requires SMB to be running." I suspect the last paragraph is relevant in your case. HTH Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jonathon A Anderson Sent: 20 September 2017 06:13 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Returning to this thread cause I'm having the same issue as Ilan, above. I'm working on setting up CES in our environment after finally getting a blocking bugfix applied. I'm making it further now, but I'm getting an error when I try to create my export: --- [root at sgate2 ~]# mmnfs export add /gpfs/summit/scratch --client 'login*.rc.int.colorado.edu(rw,root_squash);dtn*.rc.int.colorado.edu(rw,root_squash)' mmcesfuncs.sh: Current authentication: none is invalid. This operation can not be completed without correct Authentication configuration. Configure authentication using: mmuserauth mmnfs export add: Command failed. Examine previous error messages to determine cause. --- When I try to configure mmuserauth, I get an error about not having SMB active; but I don't want to configure SMB, only NFS. --- [root at sgate2 ~]# /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. --- How can I configure NFS exports with mmnfs without having to enable SMB? ~jonathon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Varun Mittal3 Sent: Tuesday, July 25, 2017 9:44:24 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sorry a small typo: mmuserauth service create --data-access-method file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune [Inactive hide details for Varun Mittal3---26/07/2017 09:12:27 AM---Hi Did you try to run this command from a CES designated nod]Varun Mittal3---26/07/2017 09:12:27 AM---Hi Did you try to run this command from a CES designated node ? From: Varun Mittal3/India/IBM To: gpfsug main discussion list Date: 26/07/2017 09:12 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication ________________________________ Hi Did you try to run this command from a CES designated node ? If no, then try executing the command from a CES node: mmuserauth service create --data-access-type file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune [Inactive hide details for Ilan Schwarts ---25/07/2017 10:22:26 AM---Hi, While trying to add the userdefined auth, I receive err]Ilan Schwarts ---25/07/2017 10:22:26 AM---Hi, While trying to add the userdefined auth, I receive error that SMB From: Ilan Schwarts To: gpfsug main discussion list Date: 25/07/2017 10:22 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi, While trying to add the userdefined auth, I receive error that SMB service not enabled. I am currently working on a spectrum scale cluster, and i dont have the SMB package, I am waiting for it.. is there a way to export NFSv3 using the spectrum scale tools without SMB package ? [root at LH20-GPFS1 ~]# mmuserauth service create --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. I exported the NFS via /etc/exports and than ./exportfs -a .. It works fine, I was able to mount the gpfs export from another machine.. this was my work-around since the spectrum scale tools failed to export NFSv3 On Mon, Jul 24, 2017 at 7:35 PM, wrote: > On Mon, 24 Jul 2017 13:36:41 +0300, Ilan Schwarts said: >> Hi, >> I have gpfs with 2 Nodes (redhat). >> I am trying to create NFS share - So I would be able to mount and >> access it from another linux machine. > >> While trying to create NFS (I execute the following): >> [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "* >> Access_Type=RW,Protocols=3:4,Squash=no_root_squash)" > > You can get away with little to no authentication for NFSv3, but > not for NFSv4. Try with Protocols=3 only and > > mmuserauth service create --type userdefined > > that should get you Unix-y NFSv3 UID/GID support and "trust what the NFS > client tells you". This of course only works sanely if each NFS export is > only to a set of machines in the same administrative domain that manages their > UID/GIDs. Exporting to two sets of machines that don't coordinate their > UID/GID space is, of course, where hilarity and hijinks ensue.... > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- - Ilan Schwarts _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From douglasof at us.ibm.com Wed Sep 20 09:28:44 2017 From: douglasof at us.ibm.com (Douglas O'flaherty) Date: Wed, 20 Sep 2017 08:28:44 +0000 Subject: [gpfsug-discuss] User Meeting & SPXXL in NYC Message-ID: Reminder that the SPXXL day on IBM Spectrum Scale in New York is open to all. It is Thursday the 28th. There is also a Power day on Wednesday. For more information http://hpcxxl.org/summer-2017-meeting-september-24-29-new-york-city/ Doug Mobile -------------- next part -------------- An HTML attachment was scrubbed... URL: From ckrafft at de.ibm.com Wed Sep 20 11:47:35 2017 From: ckrafft at de.ibm.com (Christoph Krafft) Date: Wed, 20 Sep 2017 12:47:35 +0200 Subject: [gpfsug-discuss] WANTED: Official support statement using Spectrum Scale 4.2.x with Oracle DB v12 Message-ID: Hi folks, is anyone aware if there is now an official support statement for Spectrum Scale 4.2.x? As far as my understanding goes - we currently have an "older" official support statement for v4.1 with Oracle. Many thanks up-front for any useful hints ... :) Mit freundlichen Gr??en / Sincerely Christoph Krafft Client Technical Specialist - Power Systems, IBM Systems Certified IT Specialist @ The Open Group Phone: +49 (0) 7034 643 2171 IBM Deutschland GmbH Mobile: +49 (0) 160 97 81 86 12 Am Weiher 24 Email: ckrafft at de.ibm.com 65451 Kelsterbach Germany IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Norbert Janzen, Stefan Lutz, Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 15225079.gif Type: image/gif Size: 1851 bytes Desc: not available URL: From Kevin.Buterbaugh at Vanderbilt.Edu Wed Sep 20 14:55:28 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 20 Sep 2017 13:55:28 +0000 Subject: [gpfsug-discuss] CCR cluster down for the count? In-Reply-To: References: Message-ID: Hi All, testnsd1 and testnsd3 both had hardware issues (power supply and internal HD respectively). Given that they were 12 year old boxes, we decided to replace them with other boxes that are a mere 7 years old ? keep in mind that this is a test cluster. Disabling CCR does not work, even with the undocumented ??force? option: /var/mmfs/gen root at testnsd2# mmchcluster --ccr-disable -p testnsd2 -s testnsd1 --force mmchcluster: Unable to obtain the GPFS configuration file lock. mmchcluster: GPFS was unable to obtain a lock from node testnsd1.vampire. mmchcluster: Processing continues without lock protection. The authenticity of host 'testnsd3.vampire (10.0.6.215)' can't be established. ECDSA key fingerprint is SHA256:Ky1pkjsC/kvt4RA8PJuEh/W3vcxCJZplr2m1XHr+UwI. ECDSA key fingerprint is MD5:55:59:a0:2a:6e:a1:00:58:85:3d:ac:86:0e:cd:2a:8a. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'testnsd1.vampire (10.0.6.213)' can't be established. ECDSA key fingerprint is SHA256:WPiTtyuyzhuv+lRRpgDjLuHpyHyk/W3+c5N9SabWvnE. ECDSA key fingerprint is MD5:26:26:2a:bf:e4:cb:1d:a8:27:35:96:ef:b5:96:e0:29. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp609.vampire (10.0.21.9)' can't be established. ECDSA key fingerprint is SHA256:/gX6eSp/shsRboVFcUFcNCtGSfbBIWQZ/CWjA6gb17Q. ECDSA key fingerprint is MD5:ca:4d:58:8c:91:28:25:7b:5b:b1:0d:a3:72:a3:00:bb. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp608.vampire (10.0.21.8)' can't be established. ECDSA key fingerprint is SHA256:tvtNWN9b7/Qknb/Am8x7FzyMngi6R3f5SHBqATNtLzw. ECDSA key fingerprint is MD5:fc:4e:87:fb:09:82:cd:67:b0:7d:7f:c7:4b:83:b9:6c. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp612.vampire (10.0.21.12)' can't be established. ECDSA key fingerprint is SHA256:zKXqPt8rIMZWSAYavKEuaAVIm31OGVovoWVU+dBTRPM. ECDSA key fingerprint is MD5:72:4d:fb:22:4e:b3:0e:04:37:be:16:74:ae:ea:05:6c. Are you sure you want to continue connecting (yes/no)? root at vmp610.vampire's password: testnsd3.vampire: Host key verification failed. mmdsh: testnsd3.vampire remote shell process had return code 255. testnsd1.vampire: Host key verification failed. mmdsh: testnsd1.vampire remote shell process had return code 255. vmp609.vampire: Host key verification failed. mmdsh: vmp609.vampire remote shell process had return code 255. vmp608.vampire: Host key verification failed. mmdsh: vmp608.vampire remote shell process had return code 255. vmp612.vampire: Host key verification failed. mmdsh: vmp612.vampire remote shell process had return code 255. root at vmp610.vampire's password: vmp610.vampire: Permission denied, please try again. root at vmp610.vampire's password: vmp610.vampire: Permission denied, please try again. vmp610.vampire: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password). mmdsh: vmp610.vampire remote shell process had return code 255. Verifying GPFS is stopped on all nodes ... The authenticity of host 'testnsd3.vampire (10.0.6.215)' can't be established. ECDSA key fingerprint is SHA256:Ky1pkjsC/kvt4RA8PJuEh/W3vcxCJZplr2m1XHr+UwI. ECDSA key fingerprint is MD5:55:59:a0:2a:6e:a1:00:58:85:3d:ac:86:0e:cd:2a:8a. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp612.vampire (10.0.21.12)' can't be established. ECDSA key fingerprint is SHA256:zKXqPt8rIMZWSAYavKEuaAVIm31OGVovoWVU+dBTRPM. ECDSA key fingerprint is MD5:72:4d:fb:22:4e:b3:0e:04:37:be:16:74:ae:ea:05:6c. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp608.vampire (10.0.21.8)' can't be established. ECDSA key fingerprint is SHA256:tvtNWN9b7/Qknb/Am8x7FzyMngi6R3f5SHBqATNtLzw. ECDSA key fingerprint is MD5:fc:4e:87:fb:09:82:cd:67:b0:7d:7f:c7:4b:83:b9:6c. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp609.vampire (10.0.21.9)' can't be established. ECDSA key fingerprint is SHA256:/gX6eSp/shsRboVFcUFcNCtGSfbBIWQZ/CWjA6gb17Q. ECDSA key fingerprint is MD5:ca:4d:58:8c:91:28:25:7b:5b:b1:0d:a3:72:a3:00:bb. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'testnsd1.vampire (10.0.6.213)' can't be established. ECDSA key fingerprint is SHA256:WPiTtyuyzhuv+lRRpgDjLuHpyHyk/W3+c5N9SabWvnE. ECDSA key fingerprint is MD5:26:26:2a:bf:e4:cb:1d:a8:27:35:96:ef:b5:96:e0:29. Are you sure you want to continue connecting (yes/no)? root at vmp610.vampire's password: root at vmp610.vampire's password: root at vmp610.vampire's password: testnsd3.vampire: Host key verification failed. mmdsh: testnsd3.vampire remote shell process had return code 255. vmp612.vampire: Host key verification failed. mmdsh: vmp612.vampire remote shell process had return code 255. vmp608.vampire: Host key verification failed. mmdsh: vmp608.vampire remote shell process had return code 255. vmp609.vampire: Host key verification failed. mmdsh: vmp609.vampire remote shell process had return code 255. testnsd1.vampire: Host key verification failed. mmdsh: testnsd1.vampire remote shell process had return code 255. vmp610.vampire: Permission denied, please try again. vmp610.vampire: Permission denied, please try again. vmp610.vampire: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password). mmdsh: vmp610.vampire remote shell process had return code 255. mmchcluster: Command failed. Examine previous error messages to determine cause. /var/mmfs/gen root at testnsd2# I believe that part of the problem may be that there are 4 client nodes that were removed from the cluster without removing them from the cluster (done by another SysAdmin who was in a hurry to repurpose those machines). They?re up and pingable but not reachable by GPFS anymore, which I?m pretty sure is making things worse. Nor does Loic?s suggestion of running mmcommon work (but thanks for the suggestion!) ? actually the mmcommon part worked, but a subsequent attempt to start the cluster up failed: /var/mmfs/gen root at testnsd2# mmstartup -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous error messages to determine cause. /var/mmfs/gen root at testnsd2# Thanks. Kevin On Sep 19, 2017, at 10:07 PM, IBM Spectrum Scale > wrote: Hi Kevin, Let's me try to understand the problem you have. What's the meaning of node died here. Are you mean that there are some hardware/OS issue which cannot be fixed and OS cannot be up anymore? I agree with Bob that you can have a try to disable CCR temporally, restore cluster configuration and enable it again. Such as: 1. Login to a node which has proper GPFS config, e.g NodeA 2. Shutdown daemon in all client cluster. 3. mmchcluster --ccr-disable -p NodeA 4. mmsdrrestore -a -p NodeA 5. mmauth genkey propagate -N testnsd1, testnsd3 6. mmchcluster --ccr-enable Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. "Oesterlin, Robert" ---09/20/2017 07:39:55 AM---OK ? I?ve run across this before, and it?s because of a bug (as I recall) having to do with CCR and From: "Oesterlin, Robert" > To: gpfsug main discussion list > Date: 09/20/2017 07:39 AM Subject: Re: [gpfsug-discuss] CCR cluster down for the count? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ OK ? I?ve run across this before, and it?s because of a bug (as I recall) having to do with CCR and quorum. What I think you can do is set the cluster to non-ccr (mmchcluster ?ccr-disable) with all the nodes down, bring it back up and then re-enable ccr. I?ll see if I can find this in one of the recent 4.2 release nodes. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: > on behalf of "Buterbaugh, Kevin L" > Reply-To: gpfsug main discussion list > Date: Tuesday, September 19, 2017 at 4:03 PM To: gpfsug main discussion list > Subject: [EXTERNAL] [gpfsug-discuss] CCR cluster down for the count? Hi All, We have a small test cluster that is CCR enabled. It only had/has 3 NSD servers (testnsd1, 2, and 3) and maybe 3-6 clients. testnsd3 died a while back. I did nothing about it at the time because it was due to be life-cycled as soon as I finished a couple of higher priority projects. Yesterday, testnsd1 also died, which took the whole cluster down. So now resolving this has become higher priority? ;-) I took two other boxes and set them up as testnsd1 and 3, respectively. I?ve done a ?mmsdrrestore -p testnsd2 -R /usr/bin/scp? on both of them. I?ve also done a "mmccr setup -F? and copied the ccr.disks and ccr.nodes files from testnsd2 to them. And I?ve copied /var/mmfs/gen/mmsdrfs from testnsd2 to testnsd1 and 3. In case it?s not obvious from the above, networking is fine ? ssh without a password between those 3 boxes is fine. However, when I try to startup GPFS ? or run any GPFS command I get: /root root at testnsd2# mmstartup -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous error messages to determine cause. /root root at testnsd2# I?ve got to run to a meeting right now, so I hope I?m not leaving out any crucial details here ? does anyone have an idea what I need to do? Thanks? ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=mBSa534LB4C2zN59ZsJSlginQqfcrutinpAPYNDqU_Y&s=YJEapknqzE2d9kwZzZuu6gEW0DzBoM-o94pXGEeCfuI&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C494f0469ec084568b39608d4ffd4b8c2%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636414736486816768&sdata=kBvEL7Kp2JMGuLIL4NX3UV7h3emaayQSbHr8O1F2CXc%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From bevans at pixitmedia.com Wed Sep 20 15:17:34 2017 From: bevans at pixitmedia.com (Barry Evans) Date: Wed, 20 Sep 2017 07:17:34 -0700 Subject: [gpfsug-discuss] RoCE not playing ball In-Reply-To: References: Message-ID: Yep, IP's set ok. We did try with ipv6 off to see what would happen, then turned it back on again. There are ipv6 addresses on the cards, but ipv4 is the only thing actually being used. On Tue, Sep 19, 2017 at 10:58 PM, Gang Qiu wrote: > > > > Do you set ip address for these adapters? > > Refer to the description of verbsRdmaCm in ?Command and Programming > Reference': > > If RDMA CM is enabled for a node, the node will only be able to establish > RDMA connections > using RDMA CM to other nodes with *verbsRdmaCm *enabled. RDMA CM > enablement requires > IPoIB (IP over InfiniBand) with an active IP address for each port. > Although IPv6 must be > enabled, the GPFS implementation of RDMA CM does not currently support > IPv6 addresses, so > an IPv4 address must be used. > > > > Regards, > Gang Qiu > > ************************************************************ > ********************************** > IBM China Systems & Technology Lab > Tel: 86-10-82452193 > Fax: 86-10-82452312 > Moble: 132-6134-8284 > Email: gangqiu at cn.ibm.com > Address: Ring Bldg. No.28 Building, Zhong Guan Cun Software Park, No. 8 > Dong Bei Wang West Road, ShangDi, Haidian District, Beijing 100193, > P.R.China > ??????????????8???????28???????????100193 > ************************************************************ > ********************************** > > > > From: "Olaf Weiser" > To: gpfsug main discussion list > Date: 09/20/2017 01:01 PM > Subject: Re: [gpfsug-discuss] RoCE not playing ball > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > is ib_read_bw working ? > just test it between the two nodes ... > > > > > From: Barry Evans > To: gpfsug main discussion list > Date: 09/20/2017 03:21 AM > Subject: [gpfsug-discuss] RoCE not playing ball > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi All, > > Weirdness with a RoCE interface - verbs is not playing ball and is > complaining about the inet6 address not matching up: > > 2017-09-02_07:46:01.376+0100: [I] VERBS RDMA starting with verbsRdmaCm=yes > verbsRdmaSend=no verbsRdmaUseMultiCqThreads=yes verbsRdmaUseCompVectors=yes > 2017-09-02_07:46:01.377+0100: [I] VERBS RDMA library librdmacm.so (version > >= 1.1) loaded and initialized. > 2017-09-02_07:46:01.377+0100: [I] VERBS RDMA verbsRdmasPerNode reduced > from 1000 to 514 to match (nsdMaxWorkerThreads 512 + (nspdThreadsPerQueue 2 > * nspdQueues 1)). > 2017-09-02_07:46:01.382+0100: [I] VERBS RDMA discover mlx4_1 port 1 > transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet > 0xFE80000000000000 id 0x268A07FFFEF981C0 state ACTIVE > 2017-09-02_07:46:01.383+0100: [I] VERBS RDMA discover mlx4_1 port 1 > transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[1] subnet > 0x0000000000000000 id 0x0000FFFFAC106404 state ACTIVE > 2017-09-02_07:46:01.384+0100: [I] VERBS RDMA discover mlx4_1 port 2 > transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet > 0xFE80000000000000 id 0x248A070001F981E1 state DOWN > 2017-09-02_07:46:01.385+0100: [I] VERBS RDMA discover mlx4_0 port 1 > transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet > 0xFE80000000000000 id 0x268A07FFFEF981C0 state ACTIVE > 2017-09-02_07:46:01.385+0100: [I] VERBS RDMA discover mlx4_0 port 1 > transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[1] subnet > 0x0000000000000000 id 0x0000FFFFAC106404 state ACTIVE > 2017-09-02_07:46:01.386+0100: [I] VERBS RDMA discover mlx4_0 port 2 > transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet > 0xFE80000000000000 id 0x248A070001F981C1 state ACTIVE > 2017-09-02_07:46:01.386+0100: [I] VERBS RDMA discover mlx4_0 port 2 > transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[1] subnet > 0x0000000000000000 id 0x0000FFFF0AC20011 state ACTIVE > 2017-09-02_07:46:01.387+0100: [I] VERBS RDMA parse verbsPorts mlx4_0/1 > 2017-09-02_07:46:01.390+0100: [W] VERBS RDMA parse error verbsPort > mlx4_0/1 ignored due to interface not found for port 1 of device mlx4_0 > with GID c081f9feff078a26. Please check if the correct inet6 address for > the corresponding IP network interface is set > 2017-09-02_07:46:01.390+0100: [E] VERBS RDMA: rdma_get_cm_event err -1 > 2017-09-02_07:46:01.391+0100: [I] VERBS RDMA library librdmacm.so unloaded. > 2017-09-02_07:46:01.391+0100: [E] VERBS RDMA failed to start, no valid > verbsPorts defined. > > > Anyone run into this before? I have another node imaged the *exact* same > way and no dice. Have tried a variety of drivers, cards, etc, same result > every time. > > Cheers, > Barry > > > > > > > > This email is confidential in that it is intended for the exclusive > attention of the addressee(s) indicated. If you are not the intended > recipient, this email should not be read or disclosed to any other person. > Please notify the sender immediately and delete this email from your > computer system. Any opinions expressed are not necessarily those of the > company from which this email was sent and, whilst to the best of our > knowledge no viruses or defects exist, no responsibility can be accepted > for any loss or damage arising from its receipt or subsequent use of this > email._______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug. > org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r= > NCthMXTjizwdEVDBqoDwAfRswiFbdQVHRb4mzseFLEM&m= > u155tVFn5u91gqIsTXSOSVvpbR7GQRPoVpviUDH73R0&s= > 63nY5ozD8mej1jefNBZjLGCkNOFD9-swr-lc7CRPbrM&e= > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bevans at pixitmedia.com Wed Sep 20 15:23:21 2017 From: bevans at pixitmedia.com (Barry Evans) Date: Wed, 20 Sep 2017 07:23:21 -0700 Subject: [gpfsug-discuss] RoCE not playing ball In-Reply-To: References: Message-ID: It has worked, yes, and while the issue has been present. At the moment it's not working, but I'm not entirely surprised with the amount it's been poked at. Cheers, Barry On Tue, Sep 19, 2017 at 10:00 PM, Olaf Weiser wrote: > is ib_read_bw working ? > just test it between the two nodes ... > > > > > From: Barry Evans > To: gpfsug main discussion list > Date: 09/20/2017 03:21 AM > Subject: [gpfsug-discuss] RoCE not playing ball > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi All, > > Weirdness with a RoCE interface - verbs is not playing ball and is > complaining about the inet6 address not matching up: > > 2017-09-02_07:46:01.376+0100: [I] VERBS RDMA starting with verbsRdmaCm=yes > verbsRdmaSend=no verbsRdmaUseMultiCqThreads=yes verbsRdmaUseCompVectors=yes > 2017-09-02_07:46:01.377+0100: [I] VERBS RDMA library librdmacm.so (version > >= 1.1) loaded and initialized. > 2017-09-02_07:46:01.377+0100: [I] VERBS RDMA verbsRdmasPerNode reduced > from 1000 to 514 to match (nsdMaxWorkerThreads 512 + (nspdThreadsPerQueue 2 > * nspdQueues 1)). > 2017-09-02_07:46:01.382+0100: [I] VERBS RDMA discover mlx4_1 port 1 > transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet > 0xFE80000000000000 id 0x268A07FFFEF981C0 state ACTIVE > 2017-09-02_07:46:01.383+0100: [I] VERBS RDMA discover mlx4_1 port 1 > transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[1] subnet > 0x0000000000000000 id 0x0000FFFFAC106404 state ACTIVE > 2017-09-02_07:46:01.384+0100: [I] VERBS RDMA discover mlx4_1 port 2 > transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet > 0xFE80000000000000 id 0x248A070001F981E1 state DOWN > 2017-09-02_07:46:01.385+0100: [I] VERBS RDMA discover mlx4_0 port 1 > transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet > 0xFE80000000000000 id 0x268A07FFFEF981C0 state ACTIVE > 2017-09-02_07:46:01.385+0100: [I] VERBS RDMA discover mlx4_0 port 1 > transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[1] subnet > 0x0000000000000000 id 0x0000FFFFAC106404 state ACTIVE > 2017-09-02_07:46:01.386+0100: [I] VERBS RDMA discover mlx4_0 port 2 > transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[0] subnet > 0xFE80000000000000 id 0x248A070001F981C1 state ACTIVE > 2017-09-02_07:46:01.386+0100: [I] VERBS RDMA discover mlx4_0 port 2 > transport IB link ETH NUMA node 0 pkey[0] 0xFFFF gid[1] subnet > 0x0000000000000000 id 0x0000FFFF0AC20011 state ACTIVE > 2017-09-02_07:46:01.387+0100: [I] VERBS RDMA parse verbsPorts mlx4_0/1 > 2017-09-02_07:46:01.390+0100: [W] VERBS RDMA parse error verbsPort > mlx4_0/1 ignored due to interface not found for port 1 of device mlx4_0 > with GID c081f9feff078a26. Please check if the correct inet6 address for > the corresponding IP network interface is set > 2017-09-02_07:46:01.390+0100: [E] VERBS RDMA: rdma_get_cm_event err -1 > 2017-09-02_07:46:01.391+0100: [I] VERBS RDMA library librdmacm.so unloaded. > 2017-09-02_07:46:01.391+0100: [E] VERBS RDMA failed to start, no valid > verbsPorts defined. > > > Anyone run into this before? I have another node imaged the *exact* same > way and no dice. Have tried a variety of drivers, cards, etc, same result > every time. > > Cheers, > Barry > > > > > > > This email is confidential in that it is intended for the exclusive > attention of the addressee(s) indicated. If you are not the intended > recipient, this email should not be read or disclosed to any other person. > Please notify the sender immediately and delete this email from your > computer system. Any opinions expressed are not necessarily those of the > company from which this email was sent and, whilst to the best of our > knowledge no viruses or defects exist, no responsibility can be accepted > for any loss or damage arising from its receipt or subsequent use of this > email._______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Wed Sep 20 17:00:15 2017 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Wed, 20 Sep 2017 09:00:15 -0700 Subject: [gpfsug-discuss] User Meeting & SPXXL in NYC In-Reply-To: References: Message-ID: Thanks Doug. If you plan to go, *do register*. GPFS Day is free, but we need to know how many will attend. Register using the link on the HPCXXL event page below. Cheers, Kristy > On Sep 20, 2017, at 1:28 AM, Douglas O'flaherty wrote: > > > Reminder that the SPXXL day on IBM Spectrum Scale in New York is open to all. It is Thursday the 28th. There is also a Power day on Wednesday. > > > For more information > http://hpcxxl.org/summer-2017-meeting-september-24-29-new-york-city/ > > Doug > > Mobile > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Wed Sep 20 17:27:48 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Wed, 20 Sep 2017 16:27:48 +0000 Subject: [gpfsug-discuss] CCR cluster down for the count? In-Reply-To: <20170920114844.6bf9f27b@osc.edu> References: <20170920114844.6bf9f27b@osc.edu> Message-ID: <28D10363-A8F3-439B-81DB-EB0E4E750FFD@vanderbilt.edu> Hi Ed, Thanks for the suggestion ? that?s basically what I had done yesterday after Googling and getting a hit or two on the IBM DeveloperWorks site. I?m including some output below which seems to show that I?ve got everything set up but it?s still not working. Am I missing something? We don?t use CCR on our production cluster (and this experience doesn?t make me eager to do so!), so I?m not that familiar with it... Kevin /var/mmfs/gen root at testnsd2# mmdsh -F /tmp/cluster.hostnames "ps -ef | grep mmccr | grep -v grep" | sort testdellnode1: root 2583 1 0 May30 ? 00:10:33 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testdellnode1: root 6694 2583 0 11:19 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testgateway: root 2023 5828 0 11:19 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testgateway: root 5828 1 0 Sep18 ? 00:00:19 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd1: root 19356 4628 0 11:19 tty1 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd1: root 4628 1 0 Sep19 tty1 00:00:04 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd2: root 22149 2983 0 11:16 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd2: root 2983 1 0 Sep18 ? 00:00:27 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd3: root 15685 6557 0 11:19 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd3: root 6557 1 0 Sep19 ? 00:00:04 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testsched: root 29424 6512 0 11:19 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testsched: root 6512 1 0 Sep18 ? 00:00:20 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 /var/mmfs/gen root at testnsd2# mmstartup -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous error messages to determine cause. /var/mmfs/gen root at testnsd2# mmdsh -F /tmp/cluster.hostnames "ls -l /var/mmfs/ccr" | sort testdellnode1: drwxr-xr-x 2 root root 4096 Mar 3 2017 cached testdellnode1: drwxr-xr-x 2 root root 4096 Nov 10 2016 committed testdellnode1: -rw-r--r-- 1 root root 99 Nov 10 2016 ccr.nodes testdellnode1: total 12 testgateway: drwxr-xr-x. 2 root root 4096 Jun 29 2016 committed testgateway: drwxr-xr-x. 2 root root 4096 Mar 3 2017 cached testgateway: -rw-r--r--. 1 root root 99 Jun 29 2016 ccr.nodes testgateway: total 12 testnsd1: drwxr-xr-x 2 root root 6 Sep 19 15:38 cached testnsd1: drwxr-xr-x 2 root root 6 Sep 19 15:38 committed testnsd1: -rw-r--r-- 1 root root 0 Sep 19 15:39 ccr.disks testnsd1: -rw-r--r-- 1 root root 4 Sep 19 15:38 ccr.noauth testnsd1: -rw-r--r-- 1 root root 99 Sep 19 15:39 ccr.nodes testnsd1: total 8 testnsd2: drwxr-xr-x 2 root root 22 Mar 3 2017 cached testnsd2: drwxr-xr-x 2 root root 4096 Sep 18 11:49 committed testnsd2: -rw------- 1 root root 4096 Sep 18 11:50 ccr.paxos.1 testnsd2: -rw------- 1 root root 4096 Sep 18 11:50 ccr.paxos.2 testnsd2: -rw-r--r-- 1 root root 0 Jun 29 2016 ccr.disks testnsd2: -rw-r--r-- 1 root root 99 Jun 29 2016 ccr.nodes testnsd2: total 16 testnsd3: drwxr-xr-x 2 root root 6 Sep 19 15:41 cached testnsd3: drwxr-xr-x 2 root root 6 Sep 19 15:41 committed testnsd3: -rw-r--r-- 1 root root 0 Jun 29 2016 ccr.disks testnsd3: -rw-r--r-- 1 root root 4 Sep 19 15:41 ccr.noauth testnsd3: -rw-r--r-- 1 root root 99 Jun 29 2016 ccr.nodes testnsd3: total 8 testsched: drwxr-xr-x. 2 root root 4096 Jun 29 2016 committed testsched: drwxr-xr-x. 2 root root 4096 Mar 3 2017 cached testsched: -rw-r--r--. 1 root root 99 Jun 29 2016 ccr.nodes testsched: total 12 /var/mmfs/gen root at testnsd2# more ../ccr/ccr.nodes 3,0,10.0.6.215,,testnsd3.vampire 1,0,10.0.6.213,,testnsd1.vampire 2,0,10.0.6.214,,testnsd2.vampire /var/mmfs/gen root at testnsd2# mmdsh -F /tmp/cluster.hostnames "ls -l /var/mmfs/gen/mmsdrfs" testnsd1: -rw-r--r-- 1 root root 20360 Sep 19 15:21 /var/mmfs/gen/mmsdrfs testnsd3: -rw-r--r-- 1 root root 20360 Sep 19 15:34 /var/mmfs/gen/mmsdrfs testnsd2: -rw-r--r-- 1 root root 20360 Aug 25 17:34 /var/mmfs/gen/mmsdrfs testdellnode1: -rw-r--r-- 1 root root 20360 Aug 25 17:43 /var/mmfs/gen/mmsdrfs testgateway: -rw-r--r--. 1 root root 20360 Aug 25 17:43 /var/mmfs/gen/mmsdrfs testsched: -rw-r--r--. 1 root root 20360 Aug 25 17:43 /var/mmfs/gen/mmsdrfs /var/mmfs/gen root at testnsd2# mmdsh -F /tmp/cluster.hostnames "md5sum /var/mmfs/gen/mmsdrfs" testnsd1: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs testnsd3: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs testnsd2: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs testdellnode1: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs testgateway: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs testsched: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs /var/mmfs/gen root at testnsd2# mmdsh -F /tmp/cluster.hostnames "md5sum /var/mmfs/ssl/stage/genkeyData1" testnsd3: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 testnsd1: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 testnsd2: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 testdellnode1: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 testgateway: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 testsched: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 /var/mmfs/gen root at testnsd2# On Sep 20, 2017, at 10:48 AM, Edward Wahl > wrote: I've run into this before. We didn't use to use CCR. And restoring nodes for us is a major pain in the rear as we only allow one-way root SSH, so we have a number of useful little scripts to work around problems like this. Assuming that you have all the necessary files copied to the correct places, you can manually kick off CCR. I think my script does something like: (copy the encryption key info) scp /var/mmfs/ccr/ccr.nodes :/var/mmfs/ccr/ scp /var/mmfs/gen/mmsdrfs :/var/mmfs/gen/ scp /var/mmfs/ssl/stage/genkeyData1 :/var/mmfs/ssl/stage/ :/usr/lpp/mmfs/bin/mmcommon startCcrMonitor you should then see like 2 copies of it running under mmksh. Ed On Wed, 20 Sep 2017 13:55:28 +0000 "Buterbaugh, Kevin L" > wrote: Hi All, testnsd1 and testnsd3 both had hardware issues (power supply and internal HD respectively). Given that they were 12 year old boxes, we decided to replace them with other boxes that are a mere 7 years old ? keep in mind that this is a test cluster. Disabling CCR does not work, even with the undocumented ??force? option: /var/mmfs/gen root at testnsd2# mmchcluster --ccr-disable -p testnsd2 -s testnsd1 --force mmchcluster: Unable to obtain the GPFS configuration file lock. mmchcluster: GPFS was unable to obtain a lock from node testnsd1.vampire. mmchcluster: Processing continues without lock protection. The authenticity of host 'testnsd3.vampire (10.0.6.215)' can't be established. ECDSA key fingerprint is SHA256:Ky1pkjsC/kvt4RA8PJuEh/W3vcxCJZplr2m1XHr+UwI. ECDSA key fingerprint is MD5:55:59:a0:2a:6e:a1:00:58:85:3d:ac:86:0e:cd:2a:8a. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'testnsd1.vampire (10.0.6.213)' can't be established. ECDSA key fingerprint is SHA256:WPiTtyuyzhuv+lRRpgDjLuHpyHyk/W3+c5N9SabWvnE. ECDSA key fingerprint is MD5:26:26:2a:bf:e4:cb:1d:a8:27:35:96:ef:b5:96:e0:29. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp609.vampire (10.0.21.9)' can't be established. ECDSA key fingerprint is SHA256:/gX6eSp/shsRboVFcUFcNCtGSfbBIWQZ/CWjA6gb17Q. ECDSA key fingerprint is MD5:ca:4d:58:8c:91:28:25:7b:5b:b1:0d:a3:72:a3:00:bb. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp608.vampire (10.0.21.8)' can't be established. ECDSA key fingerprint is SHA256:tvtNWN9b7/Qknb/Am8x7FzyMngi6R3f5SHBqATNtLzw. ECDSA key fingerprint is MD5:fc:4e:87:fb:09:82:cd:67:b0:7d:7f:c7:4b:83:b9:6c. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp612.vampire (10.0.21.12)' can't be established. ECDSA key fingerprint is SHA256:zKXqPt8rIMZWSAYavKEuaAVIm31OGVovoWVU+dBTRPM. ECDSA key fingerprint is MD5:72:4d:fb:22:4e:b3:0e:04:37:be:16:74:ae:ea:05:6c. Are you sure you want to continue connecting (yes/no)? root at vmp610.vampire's password: testnsd3.vampire: Host key verification failed. mmdsh: testnsd3.vampire remote shell process had return code 255. testnsd1.vampire: Host key verification failed. mmdsh: testnsd1.vampire remote shell process had return code 255. vmp609.vampire: Host key verification failed. mmdsh: vmp609.vampire remote shell process had return code 255. vmp608.vampire: Host key verification failed. mmdsh: vmp608.vampire remote shell process had return code 255. vmp612.vampire: Host key verification failed. mmdsh: vmp612.vampire remote shell process had return code 255. root at vmp610.vampire's password: vmp610.vampire: Permission denied, please try again. root at vmp610.vampire's password: vmp610.vampire: Permission denied, please try again. vmp610.vampire: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password). mmdsh: vmp610.vampire remote shell process had return code 255. Verifying GPFS is stopped on all nodes ... The authenticity of host 'testnsd3.vampire (10.0.6.215)' can't be established. ECDSA key fingerprint is SHA256:Ky1pkjsC/kvt4RA8PJuEh/W3vcxCJZplr2m1XHr+UwI. ECDSA key fingerprint is MD5:55:59:a0:2a:6e:a1:00:58:85:3d:ac:86:0e:cd:2a:8a. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp612.vampire (10.0.21.12)' can't be established. ECDSA key fingerprint is SHA256:zKXqPt8rIMZWSAYavKEuaAVIm31OGVovoWVU+dBTRPM. ECDSA key fingerprint is MD5:72:4d:fb:22:4e:b3:0e:04:37:be:16:74:ae:ea:05:6c. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp608.vampire (10.0.21.8)' can't be established. ECDSA key fingerprint is SHA256:tvtNWN9b7/Qknb/Am8x7FzyMngi6R3f5SHBqATNtLzw. ECDSA key fingerprint is MD5:fc:4e:87:fb:09:82:cd:67:b0:7d:7f:c7:4b:83:b9:6c. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp609.vampire (10.0.21.9)' can't be established. ECDSA key fingerprint is SHA256:/gX6eSp/shsRboVFcUFcNCtGSfbBIWQZ/CWjA6gb17Q. ECDSA key fingerprint is MD5:ca:4d:58:8c:91:28:25:7b:5b:b1:0d:a3:72:a3:00:bb. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'testnsd1.vampire (10.0.6.213)' can't be established. ECDSA key fingerprint is SHA256:WPiTtyuyzhuv+lRRpgDjLuHpyHyk/W3+c5N9SabWvnE. ECDSA key fingerprint is MD5:26:26:2a:bf:e4:cb:1d:a8:27:35:96:ef:b5:96:e0:29. Are you sure you want to continue connecting (yes/no)? root at vmp610.vampire's password: root at vmp610.vampire's password: root at vmp610.vampire's password: testnsd3.vampire: Host key verification failed. mmdsh: testnsd3.vampire remote shell process had return code 255. vmp612.vampire: Host key verification failed. mmdsh: vmp612.vampire remote shell process had return code 255. vmp608.vampire: Host key verification failed. mmdsh: vmp608.vampire remote shell process had return code 255. vmp609.vampire: Host key verification failed. mmdsh: vmp609.vampire remote shell process had return code 255. testnsd1.vampire: Host key verification failed. mmdsh: testnsd1.vampire remote shell process had return code 255. vmp610.vampire: Permission denied, please try again. vmp610.vampire: Permission denied, please try again. vmp610.vampire: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password). mmdsh: vmp610.vampire remote shell process had return code 255. mmchcluster: Command failed. Examine previous error messages to determine cause. /var/mmfs/gen root at testnsd2# I believe that part of the problem may be that there are 4 client nodes that were removed from the cluster without removing them from the cluster (done by another SysAdmin who was in a hurry to repurpose those machines). They?re up and pingable but not reachable by GPFS anymore, which I?m pretty sure is making things worse. Nor does Loic?s suggestion of running mmcommon work (but thanks for the suggestion!) ? actually the mmcommon part worked, but a subsequent attempt to start the cluster up failed: /var/mmfs/gen root at testnsd2# mmstartup -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous error messages to determine cause. /var/mmfs/gen root at testnsd2# Thanks. Kevin On Sep 19, 2017, at 10:07 PM, IBM Spectrum Scale > wrote: Hi Kevin, Let's me try to understand the problem you have. What's the meaning of node died here. Are you mean that there are some hardware/OS issue which cannot be fixed and OS cannot be up anymore? I agree with Bob that you can have a try to disable CCR temporally, restore cluster configuration and enable it again. Such as: 1. Login to a node which has proper GPFS config, e.g NodeA 2. Shutdown daemon in all client cluster. 3. mmchcluster --ccr-disable -p NodeA 4. mmsdrrestore -a -p NodeA 5. mmauth genkey propagate -N testnsd1, testnsd3 6. mmchcluster --ccr-enable Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fdeveloperworks%2Fcommunity%2Fforums%2Fhtml%2Fforum%3Fid%3D11111111-0000-0000-0000-000000000479&data=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C745cfeaac7264124bb8c08d5003f162a%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636415193316350738&sdata=8OL9COHsb4M%2BZOyWta92acdO8K1Ez8HJfHbrCdDsmRs%3D&reserved=0. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. "Oesterlin, Robert" ---09/20/2017 07:39:55 AM---OK ? I?ve run across this before, and it?s because of a bug (as I recall) having to do with CCR and From: "Oesterlin, Robert" > To: gpfsug main discussion list > Date: 09/20/2017 07:39 AM Subject: Re: [gpfsug-discuss] CCR cluster down for the count? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ OK ? I?ve run across this before, and it?s because of a bug (as I recall) having to do with CCR and quorum. What I think you can do is set the cluster to non-ccr (mmchcluster ?ccr-disable) with all the nodes down, bring it back up and then re-enable ccr. I?ll see if I can find this in one of the recent 4.2 release nodes. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: > on behalf of "Buterbaugh, Kevin L" > Reply-To: gpfsug main discussion list > Date: Tuesday, September 19, 2017 at 4:03 PM To: gpfsug main discussion list > Subject: [EXTERNAL] [gpfsug-discuss] CCR cluster down for the count? Hi All, We have a small test cluster that is CCR enabled. It only had/has 3 NSD servers (testnsd1, 2, and 3) and maybe 3-6 clients. testnsd3 died a while back. I did nothing about it at the time because it was due to be life-cycled as soon as I finished a couple of higher priority projects. Yesterday, testnsd1 also died, which took the whole cluster down. So now resolving this has become higher priority? ;-) I took two other boxes and set them up as testnsd1 and 3, respectively. I?ve done a ?mmsdrrestore -p testnsd2 -R /usr/bin/scp? on both of them. I?ve also done a "mmccr setup -F? and copied the ccr.disks and ccr.nodes files from testnsd2 to them. And I?ve copied /var/mmfs/gen/mmsdrfs from testnsd2 to testnsd1 and 3. In case it?s not obvious from the above, networking is fine ? ssh without a password between those 3 boxes is fine. However, when I try to startup GPFS ? or run any GPFS command I get: /root root at testnsd2# mmstartup -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous error messages to determine cause. /root root at testnsd2# I?ve got to run to a meeting right now, so I hope I?m not leaving out any crucial details here ? does anyone have an idea what I need to do? Thanks? ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttp-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss%26d%3DDwICAg%26c%3Djf_iaSHvJObTbx-siA1ZOg%26r%3DIbxtjdkPAM2Sbon4Lbbi4w%26m%3DmBSa534LB4C2zN59ZsJSlginQqfcrutinpAPYNDqU_Y%26s%3DYJEapknqzE2d9kwZzZuu6gEW0DzBoM-o94pXGEeCfuI%26e&data=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C745cfeaac7264124bb8c08d5003f162a%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636415193316350738&sdata=oQ4u%2BdyyYLY7HzaOqRPEGjUVhi7AQF%2BvbvnWA4bhuXE%3D&reserved=0= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C494f0469ec084568b39608d4ffd4b8c2%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636414736486816768&sdata=kBvEL7Kp2JMGuLIL4NX3UV7h3emaayQSbHr8O1F2CXc%3D&reserved=0 -- Ed Wahl Ohio Supercomputer Center 614-292-9302 ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From stijn.deweirdt at ugent.be Wed Sep 20 18:48:26 2017 From: stijn.deweirdt at ugent.be (Stijn De Weirdt) Date: Wed, 20 Sep 2017 19:48:26 +0200 Subject: [gpfsug-discuss] CCR cluster down for the count? In-Reply-To: <28D10363-A8F3-439B-81DB-EB0E4E750FFD@vanderbilt.edu> References: <20170920114844.6bf9f27b@osc.edu> <28D10363-A8F3-439B-81DB-EB0E4E750FFD@vanderbilt.edu> Message-ID: <1f0b2657-8ca3-7b35-95f3-7c4edb6c0818@ugent.be> hi kevin, we were hit by similar issue when we did something not so smart: we had a 5 node quorum, and we wanted to replace 1 test node with 3 more production quorum node. we however first removed the test node, and then with 4 quorum nodes we did mmshutdown for some other config modifications. when we tried to start it, we hit the same "Not enough CCR quorum nodes available" errors. also, none of the ccr commands were helpful; they also hanged, even simple ones like show etc etc. what we did in the end was the following (and some try-and-error): from the /var/adm/ras/mmsdrserv.log logfiles we guessed that we had some sort of split brain paxos cluster (some reported " ccrd: recovery complete (rc 809)", some same message with 'rc 0' and some didn't have the recovery complete on the last line(s)) * stop ccr everywhere mmshutdown -a mmdsh -N all pkill -9 -f mmccr * one by one, start the paxos cluster using mmshutdown on the quorum nodes (mmshutdown will start ccr and there is no unit or something to help with that). * the nodes will join after 3-4 minutes and report "recovery complete"; wait for it before you start another one * the trial-and-error part was that sometimes there was recovery complete with rc=809, sometimes with rc=0. in the end, once they all had same rc=0, paxos was happy again and eg mmlsconfig worked again. this left a very bad experience with CCR with us, but we want to use ces, so no real alternative (and to be honest, with odd number of quorum, we saw no more issues, everyting was smooth). in particular we were missing * unit files for all extra services that gpfs launched (mmccrmoniotr, mmsysmon); so we can monitor and start/stop them cleanly * ccr commands that work with broken paxos setup; eg to report that the paxos cluster is broken or operating in some split-brain mode. anyway, YMMV and good luck. stijn On 09/20/2017 06:27 PM, Buterbaugh, Kevin L wrote: > Hi Ed, > > Thanks for the suggestion ? that?s basically what I had done yesterday after Googling and getting a hit or two on the IBM DeveloperWorks site. I?m including some output below which seems to show that I?ve got everything set up but it?s still not working. > > Am I missing something? We don?t use CCR on our production cluster (and this experience doesn?t make me eager to do so!), so I?m not that familiar with it... > > Kevin > > /var/mmfs/gen > root at testnsd2# mmdsh -F /tmp/cluster.hostnames "ps -ef | grep mmccr | grep -v grep" | sort > testdellnode1: root 2583 1 0 May30 ? 00:10:33 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testdellnode1: root 6694 2583 0 11:19 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testgateway: root 2023 5828 0 11:19 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testgateway: root 5828 1 0 Sep18 ? 00:00:19 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testnsd1: root 19356 4628 0 11:19 tty1 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testnsd1: root 4628 1 0 Sep19 tty1 00:00:04 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testnsd2: root 22149 2983 0 11:16 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testnsd2: root 2983 1 0 Sep18 ? 00:00:27 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testnsd3: root 15685 6557 0 11:19 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testnsd3: root 6557 1 0 Sep19 ? 00:00:04 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testsched: root 29424 6512 0 11:19 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testsched: root 6512 1 0 Sep18 ? 00:00:20 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > /var/mmfs/gen > root at testnsd2# mmstartup -a > get file failed: Not enough CCR quorum nodes available (err 809) > gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 > mmstartup: Command failed. Examine previous error messages to determine cause. > /var/mmfs/gen > root at testnsd2# mmdsh -F /tmp/cluster.hostnames "ls -l /var/mmfs/ccr" | sort > testdellnode1: drwxr-xr-x 2 root root 4096 Mar 3 2017 cached > testdellnode1: drwxr-xr-x 2 root root 4096 Nov 10 2016 committed > testdellnode1: -rw-r--r-- 1 root root 99 Nov 10 2016 ccr.nodes > testdellnode1: total 12 > testgateway: drwxr-xr-x. 2 root root 4096 Jun 29 2016 committed > testgateway: drwxr-xr-x. 2 root root 4096 Mar 3 2017 cached > testgateway: -rw-r--r--. 1 root root 99 Jun 29 2016 ccr.nodes > testgateway: total 12 > testnsd1: drwxr-xr-x 2 root root 6 Sep 19 15:38 cached > testnsd1: drwxr-xr-x 2 root root 6 Sep 19 15:38 committed > testnsd1: -rw-r--r-- 1 root root 0 Sep 19 15:39 ccr.disks > testnsd1: -rw-r--r-- 1 root root 4 Sep 19 15:38 ccr.noauth > testnsd1: -rw-r--r-- 1 root root 99 Sep 19 15:39 ccr.nodes > testnsd1: total 8 > testnsd2: drwxr-xr-x 2 root root 22 Mar 3 2017 cached > testnsd2: drwxr-xr-x 2 root root 4096 Sep 18 11:49 committed > testnsd2: -rw------- 1 root root 4096 Sep 18 11:50 ccr.paxos.1 > testnsd2: -rw------- 1 root root 4096 Sep 18 11:50 ccr.paxos.2 > testnsd2: -rw-r--r-- 1 root root 0 Jun 29 2016 ccr.disks > testnsd2: -rw-r--r-- 1 root root 99 Jun 29 2016 ccr.nodes > testnsd2: total 16 > testnsd3: drwxr-xr-x 2 root root 6 Sep 19 15:41 cached > testnsd3: drwxr-xr-x 2 root root 6 Sep 19 15:41 committed > testnsd3: -rw-r--r-- 1 root root 0 Jun 29 2016 ccr.disks > testnsd3: -rw-r--r-- 1 root root 4 Sep 19 15:41 ccr.noauth > testnsd3: -rw-r--r-- 1 root root 99 Jun 29 2016 ccr.nodes > testnsd3: total 8 > testsched: drwxr-xr-x. 2 root root 4096 Jun 29 2016 committed > testsched: drwxr-xr-x. 2 root root 4096 Mar 3 2017 cached > testsched: -rw-r--r--. 1 root root 99 Jun 29 2016 ccr.nodes > testsched: total 12 > /var/mmfs/gen > root at testnsd2# more ../ccr/ccr.nodes > 3,0,10.0.6.215,,testnsd3.vampire > 1,0,10.0.6.213,,testnsd1.vampire > 2,0,10.0.6.214,,testnsd2.vampire > /var/mmfs/gen > root at testnsd2# mmdsh -F /tmp/cluster.hostnames "ls -l /var/mmfs/gen/mmsdrfs" > testnsd1: -rw-r--r-- 1 root root 20360 Sep 19 15:21 /var/mmfs/gen/mmsdrfs > testnsd3: -rw-r--r-- 1 root root 20360 Sep 19 15:34 /var/mmfs/gen/mmsdrfs > testnsd2: -rw-r--r-- 1 root root 20360 Aug 25 17:34 /var/mmfs/gen/mmsdrfs > testdellnode1: -rw-r--r-- 1 root root 20360 Aug 25 17:43 /var/mmfs/gen/mmsdrfs > testgateway: -rw-r--r--. 1 root root 20360 Aug 25 17:43 /var/mmfs/gen/mmsdrfs > testsched: -rw-r--r--. 1 root root 20360 Aug 25 17:43 /var/mmfs/gen/mmsdrfs > /var/mmfs/gen > root at testnsd2# mmdsh -F /tmp/cluster.hostnames "md5sum /var/mmfs/gen/mmsdrfs" > testnsd1: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs > testnsd3: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs > testnsd2: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs > testdellnode1: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs > testgateway: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs > testsched: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs > /var/mmfs/gen > root at testnsd2# mmdsh -F /tmp/cluster.hostnames "md5sum /var/mmfs/ssl/stage/genkeyData1" > testnsd3: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 > testnsd1: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 > testnsd2: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 > testdellnode1: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 > testgateway: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 > testsched: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 > /var/mmfs/gen > root at testnsd2# > > On Sep 20, 2017, at 10:48 AM, Edward Wahl > wrote: > > I've run into this before. We didn't use to use CCR. And restoring nodes for > us is a major pain in the rear as we only allow one-way root SSH, so we have a > number of useful little scripts to work around problems like this. > > Assuming that you have all the necessary files copied to the correct > places, you can manually kick off CCR. > > I think my script does something like: > > (copy the encryption key info) > > scp /var/mmfs/ccr/ccr.nodes :/var/mmfs/ccr/ > > scp /var/mmfs/gen/mmsdrfs :/var/mmfs/gen/ > > scp /var/mmfs/ssl/stage/genkeyData1 :/var/mmfs/ssl/stage/ > > :/usr/lpp/mmfs/bin/mmcommon startCcrMonitor > > you should then see like 2 copies of it running under mmksh. > > Ed > > > On Wed, 20 Sep 2017 13:55:28 +0000 > "Buterbaugh, Kevin L" > wrote: > > Hi All, > > testnsd1 and testnsd3 both had hardware issues (power supply and internal HD > respectively). Given that they were 12 year old boxes, we decided to replace > them with other boxes that are a mere 7 years old ? keep in mind that this is > a test cluster. > > Disabling CCR does not work, even with the undocumented ??force? option: > > /var/mmfs/gen > root at testnsd2# mmchcluster --ccr-disable -p testnsd2 -s testnsd1 --force > mmchcluster: Unable to obtain the GPFS configuration file lock. > mmchcluster: GPFS was unable to obtain a lock from node testnsd1.vampire. > mmchcluster: Processing continues without lock protection. > The authenticity of host 'testnsd3.vampire (10.0.6.215)' can't be established. > ECDSA key fingerprint is SHA256:Ky1pkjsC/kvt4RA8PJuEh/W3vcxCJZplr2m1XHr+UwI. > ECDSA key fingerprint is MD5:55:59:a0:2a:6e:a1:00:58:85:3d:ac:86:0e:cd:2a:8a. > Are you sure you want to continue connecting (yes/no)? The authenticity of > host 'testnsd1.vampire (10.0.6.213)' can't be established. ECDSA key > fingerprint is SHA256:WPiTtyuyzhuv+lRRpgDjLuHpyHyk/W3+c5N9SabWvnE. ECDSA key > fingerprint is MD5:26:26:2a:bf:e4:cb:1d:a8:27:35:96:ef:b5:96:e0:29. Are you > sure you want to continue connecting (yes/no)? The authenticity of host > 'vmp609.vampire (10.0.21.9)' can't be established. ECDSA key fingerprint is > SHA256:/gX6eSp/shsRboVFcUFcNCtGSfbBIWQZ/CWjA6gb17Q. ECDSA key fingerprint is > MD5:ca:4d:58:8c:91:28:25:7b:5b:b1:0d:a3:72:a3:00:bb. Are you sure you want to > continue connecting (yes/no)? The authenticity of host 'vmp608.vampire > (10.0.21.8)' can't be established. ECDSA key fingerprint is > SHA256:tvtNWN9b7/Qknb/Am8x7FzyMngi6R3f5SHBqATNtLzw. ECDSA key fingerprint is > MD5:fc:4e:87:fb:09:82:cd:67:b0:7d:7f:c7:4b:83:b9:6c. Are you sure you want to > continue connecting (yes/no)? The authenticity of host 'vmp612.vampire > (10.0.21.12)' can't be established. ECDSA key fingerprint is > SHA256:zKXqPt8rIMZWSAYavKEuaAVIm31OGVovoWVU+dBTRPM. ECDSA key fingerprint is > MD5:72:4d:fb:22:4e:b3:0e:04:37:be:16:74:ae:ea:05:6c. Are you sure you want to > continue connecting (yes/no)? > root at vmp610.vampire's password: > testnsd3.vampire: Host key verification failed. mmdsh: testnsd3.vampire > remote shell process had return code 255. testnsd1.vampire: Host key > verification failed. mmdsh: testnsd1.vampire remote shell process had return > code 255. vmp609.vampire: Host key verification failed. mmdsh: > vmp609.vampire remote shell process had return code 255. vmp608.vampire: > Host key verification failed. mmdsh: vmp608.vampire remote shell process had > return code 255. vmp612.vampire: Host key verification failed. mmdsh: > vmp612.vampire remote shell process had return code 255. > > root at vmp610.vampire's password: vmp610.vampire: > Permission denied, please try again. > > root at vmp610.vampire's password: vmp610.vampire: > Permission denied, please try again. > > vmp610.vampire: Permission denied > (publickey,gssapi-keyex,gssapi-with-mic,password). mmdsh: vmp610.vampire > remote shell process had return code 255. > > Verifying GPFS is stopped on all nodes ... > The authenticity of host 'testnsd3.vampire (10.0.6.215)' can't be established. > ECDSA key fingerprint is SHA256:Ky1pkjsC/kvt4RA8PJuEh/W3vcxCJZplr2m1XHr+UwI. > ECDSA key fingerprint is MD5:55:59:a0:2a:6e:a1:00:58:85:3d:ac:86:0e:cd:2a:8a. > Are you sure you want to continue connecting (yes/no)? The authenticity of > host 'vmp612.vampire (10.0.21.12)' can't be established. ECDSA key > fingerprint is SHA256:zKXqPt8rIMZWSAYavKEuaAVIm31OGVovoWVU+dBTRPM. ECDSA key > fingerprint is MD5:72:4d:fb:22:4e:b3:0e:04:37:be:16:74:ae:ea:05:6c. Are you > sure you want to continue connecting (yes/no)? The authenticity of host > 'vmp608.vampire (10.0.21.8)' can't be established. ECDSA key fingerprint is > SHA256:tvtNWN9b7/Qknb/Am8x7FzyMngi6R3f5SHBqATNtLzw. ECDSA key fingerprint is > MD5:fc:4e:87:fb:09:82:cd:67:b0:7d:7f:c7:4b:83:b9:6c. Are you sure you want to > continue connecting (yes/no)? The authenticity of host 'vmp609.vampire > (10.0.21.9)' can't be established. ECDSA key fingerprint is > SHA256:/gX6eSp/shsRboVFcUFcNCtGSfbBIWQZ/CWjA6gb17Q. ECDSA key fingerprint is > MD5:ca:4d:58:8c:91:28:25:7b:5b:b1:0d:a3:72:a3:00:bb. Are you sure you want to > continue connecting (yes/no)? The authenticity of host 'testnsd1.vampire > (10.0.6.213)' can't be established. ECDSA key fingerprint is > SHA256:WPiTtyuyzhuv+lRRpgDjLuHpyHyk/W3+c5N9SabWvnE. ECDSA key fingerprint is > MD5:26:26:2a:bf:e4:cb:1d:a8:27:35:96:ef:b5:96:e0:29. Are you sure you want to > continue connecting (yes/no)? > root at vmp610.vampire's password: > root at vmp610.vampire's password: > root at vmp610.vampire's password: > > testnsd3.vampire: Host key verification failed. > mmdsh: testnsd3.vampire remote shell process had return code 255. > vmp612.vampire: Host key verification failed. > mmdsh: vmp612.vampire remote shell process had return code 255. > vmp608.vampire: Host key verification failed. > mmdsh: vmp608.vampire remote shell process had return code 255. > vmp609.vampire: Host key verification failed. > mmdsh: vmp609.vampire remote shell process had return code 255. > testnsd1.vampire: Host key verification failed. > mmdsh: testnsd1.vampire remote shell process had return code 255. > vmp610.vampire: Permission denied, please try again. > vmp610.vampire: Permission denied, please try again. > vmp610.vampire: Permission denied > (publickey,gssapi-keyex,gssapi-with-mic,password). mmdsh: vmp610.vampire > remote shell process had return code 255. mmchcluster: Command failed. > Examine previous error messages to determine cause. /var/mmfs/gen > root at testnsd2# > > I believe that part of the problem may be that there are 4 client nodes that > were removed from the cluster without removing them from the cluster (done by > another SysAdmin who was in a hurry to repurpose those machines). They?re up > and pingable but not reachable by GPFS anymore, which I?m pretty sure is > making things worse. > > Nor does Loic?s suggestion of running mmcommon work (but thanks for the > suggestion!) ? actually the mmcommon part worked, but a subsequent attempt to > start the cluster up failed: > > /var/mmfs/gen > root at testnsd2# mmstartup -a > get file failed: Not enough CCR quorum nodes available (err 809) > gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 > mmstartup: Command failed. Examine previous error messages to determine cause. > /var/mmfs/gen > root at testnsd2# > > Thanks. > > Kevin > > On Sep 19, 2017, at 10:07 PM, IBM Spectrum Scale > > wrote: > > > Hi Kevin, > > Let's me try to understand the problem you have. What's the meaning of node > died here. Are you mean that there are some hardware/OS issue which cannot be > fixed and OS cannot be up anymore? > > I agree with Bob that you can have a try to disable CCR temporally, restore > cluster configuration and enable it again. > > Such as: > > 1. Login to a node which has proper GPFS config, e.g NodeA > 2. Shutdown daemon in all client cluster. > 3. mmchcluster --ccr-disable -p NodeA > 4. mmsdrrestore -a -p NodeA > 5. mmauth genkey propagate -N testnsd1, testnsd3 > 6. mmchcluster --ccr-enable > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fdeveloperworks%2Fcommunity%2Fforums%2Fhtml%2Fforum%3Fid%3D11111111-0000-0000-0000-000000000479&data=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C745cfeaac7264124bb8c08d5003f162a%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636415193316350738&sdata=8OL9COHsb4M%2BZOyWta92acdO8K1Ez8HJfHbrCdDsmRs%3D&reserved=0. > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 in the United States or your local IBM Service Center in other > countries. > > The forum is informally monitored as time permits and should not be used for > priority messages to the Spectrum Scale (GPFS) team. > > "Oesterlin, Robert" ---09/20/2017 07:39:55 AM---OK ? I?ve run > across this before, and it?s because of a bug (as I recall) having to do with > CCR and > > From: "Oesterlin, Robert" > > To: gpfsug > main discussion list > > > Date: 09/20/2017 07:39 AM Subject: Re: [gpfsug-discuss] CCR cluster down for > the count? Sent by: > gpfsug-discuss-bounces at spectrumscale.org > > ________________________________ > > > > OK ? I?ve run across this before, and it?s because of a bug (as I recall) > having to do with CCR and quorum. What I think you can do is set the cluster > to non-ccr (mmchcluster ?ccr-disable) with all the nodes down, bring it back > up and then re-enable ccr. > > I?ll see if I can find this in one of the recent 4.2 release nodes. > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > > From: > > > on behalf of "Buterbaugh, Kevin L" > > > Reply-To: gpfsug main discussion list > > > Date: Tuesday, September 19, 2017 at 4:03 PM To: gpfsug main discussion list > > > Subject: [EXTERNAL] [gpfsug-discuss] CCR cluster down for the count? > > Hi All, > > We have a small test cluster that is CCR enabled. It only had/has 3 NSD > servers (testnsd1, 2, and 3) and maybe 3-6 clients. testnsd3 died a while > back. I did nothing about it at the time because it was due to be life-cycled > as soon as I finished a couple of higher priority projects. > > Yesterday, testnsd1 also died, which took the whole cluster down. So now > resolving this has become higher priority? ;-) > > I took two other boxes and set them up as testnsd1 and 3, respectively. I?ve > done a ?mmsdrrestore -p testnsd2 -R /usr/bin/scp? on both of them. I?ve also > done a "mmccr setup -F? and copied the ccr.disks and ccr.nodes files from > testnsd2 to them. And I?ve copied /var/mmfs/gen/mmsdrfs from testnsd2 to > testnsd1 and 3. In case it?s not obvious from the above, networking is fine ? > ssh without a password between those 3 boxes is fine. > > However, when I try to startup GPFS ? or run any GPFS command I get: > > /root > root at testnsd2# mmstartup -a > get file failed: Not enough CCR quorum nodes available (err 809) > gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 > mmstartup: Command failed. Examine previous error messages to determine cause. > /root > root at testnsd2# > > I?ve got to run to a meeting right now, so I hope I?m not leaving out any > crucial details here ? does anyone have an idea what I need to do? Thanks? > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu - > (615)875-9633 > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttp-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss%26d%3DDwICAg%26c%3Djf_iaSHvJObTbx-siA1ZOg%26r%3DIbxtjdkPAM2Sbon4Lbbi4w%26m%3DmBSa534LB4C2zN59ZsJSlginQqfcrutinpAPYNDqU_Y%26s%3DYJEapknqzE2d9kwZzZuu6gEW0DzBoM-o94pXGEeCfuI%26e&data=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C745cfeaac7264124bb8c08d5003f162a%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636415193316350738&sdata=oQ4u%2BdyyYLY7HzaOqRPEGjUVhi7AQF%2BvbvnWA4bhuXE%3D&reserved=0= > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C494f0469ec084568b39608d4ffd4b8c2%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636414736486816768&sdata=kBvEL7Kp2JMGuLIL4NX3UV7h3emaayQSbHr8O1F2CXc%3D&reserved=0 > > > > > -- > > Ed Wahl > Ohio Supercomputer Center > 614-292-9302 > > > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > From jonathon.anderson at colorado.edu Wed Sep 20 19:55:04 2017 From: jonathon.anderson at colorado.edu (Jonathon A Anderson) Date: Wed, 20 Sep 2017 18:55:04 +0000 Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication In-Reply-To: References: <27469.1500914134@turing-police.cc.vt.edu> , , Message-ID: I shouldn't need SMB for authentication if I'm only using userdefined authentication, though. ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Sobey, Richard A Sent: Wednesday, September 20, 2017 2:23:37 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication This sounded familiar to a problem I had to do with SMB and NFS. I've looked, and it's a different problem, but at the time I had this response. "That would be the case when Active Directory is configured for authentication. In that case the SMB service includes two aspects: One is the actual SMB file server, and the second one is the service for the Active Directory integration. Since NFS depends on authentication and id mapping services, it requires SMB to be running." I suspect the last paragraph is relevant in your case. HTH Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jonathon A Anderson Sent: 20 September 2017 06:13 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Returning to this thread cause I'm having the same issue as Ilan, above. I'm working on setting up CES in our environment after finally getting a blocking bugfix applied. I'm making it further now, but I'm getting an error when I try to create my export: --- [root at sgate2 ~]# mmnfs export add /gpfs/summit/scratch --client 'login*.rc.int.colorado.edu(rw,root_squash);dtn*.rc.int.colorado.edu(rw,root_squash)' mmcesfuncs.sh: Current authentication: none is invalid. This operation can not be completed without correct Authentication configuration. Configure authentication using: mmuserauth mmnfs export add: Command failed. Examine previous error messages to determine cause. --- When I try to configure mmuserauth, I get an error about not having SMB active; but I don't want to configure SMB, only NFS. --- [root at sgate2 ~]# /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. --- How can I configure NFS exports with mmnfs without having to enable SMB? ~jonathon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Varun Mittal3 Sent: Tuesday, July 25, 2017 9:44:24 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sorry a small typo: mmuserauth service create --data-access-method file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune [Inactive hide details for Varun Mittal3---26/07/2017 09:12:27 AM---Hi Did you try to run this command from a CES designated nod]Varun Mittal3---26/07/2017 09:12:27 AM---Hi Did you try to run this command from a CES designated node ? From: Varun Mittal3/India/IBM To: gpfsug main discussion list Date: 26/07/2017 09:12 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication ________________________________ Hi Did you try to run this command from a CES designated node ? If no, then try executing the command from a CES node: mmuserauth service create --data-access-type file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune [Inactive hide details for Ilan Schwarts ---25/07/2017 10:22:26 AM---Hi, While trying to add the userdefined auth, I receive err]Ilan Schwarts ---25/07/2017 10:22:26 AM---Hi, While trying to add the userdefined auth, I receive error that SMB From: Ilan Schwarts To: gpfsug main discussion list Date: 25/07/2017 10:22 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi, While trying to add the userdefined auth, I receive error that SMB service not enabled. I am currently working on a spectrum scale cluster, and i dont have the SMB package, I am waiting for it.. is there a way to export NFSv3 using the spectrum scale tools without SMB package ? [root at LH20-GPFS1 ~]# mmuserauth service create --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. I exported the NFS via /etc/exports and than ./exportfs -a .. It works fine, I was able to mount the gpfs export from another machine.. this was my work-around since the spectrum scale tools failed to export NFSv3 On Mon, Jul 24, 2017 at 7:35 PM, wrote: > On Mon, 24 Jul 2017 13:36:41 +0300, Ilan Schwarts said: >> Hi, >> I have gpfs with 2 Nodes (redhat). >> I am trying to create NFS share - So I would be able to mount and >> access it from another linux machine. > >> While trying to create NFS (I execute the following): >> [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "* >> Access_Type=RW,Protocols=3:4,Squash=no_root_squash)" > > You can get away with little to no authentication for NFSv3, but > not for NFSv4. Try with Protocols=3 only and > > mmuserauth service create --type userdefined > > that should get you Unix-y NFSv3 UID/GID support and "trust what the NFS > client tells you". This of course only works sanely if each NFS export is > only to a set of machines in the same administrative domain that manages their > UID/GIDs. Exporting to two sets of machines that don't coordinate their > UID/GID space is, of course, where hilarity and hijinks ensue.... > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://secure-web.cisco.com/1w-ldlm8bq5oYiMuHk7N1T32DW18VkxjnfkMWDjdpiBv1WJToz9PCO1zVyGvWIVP3-fNVoZ49ioTlOwQoRbyC_MjpoBPlD3jfpV_knuzViyRNZiyliSGH9rx5nGVvTLSPrjIwzvUIZDadCuNXgM_jYCVBE2RsDpg8o4LCjJv9QIZPbyHlKrkoQ0sNGXOZPYT7gxpo8sVjoxKQbOgQzkDnPMQoa2a8miTP19fLkB5HqV5cJv3U-Vs_qLtyJGVsrSgLu2wQoDMxymVwm5mcRWO6MYfl4_MtVXKzQRwQqemODDjSa5my7zl98vobN_ui-cRwCYeVbOwEd57CjaYRzKcBu6Dbd2TmGar7JUNWVtg1dZPTv6uothD6V4g0Q0MuXZsBICzfxbjXI9WlB3Tiu3ty0oxenYrM8yxE-Cl57VhmV4KlY18EHMFncfLtRkk9cTHtfrEjiXBROhCuvEeqhrYT6A/http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss > -- - Ilan Schwarts _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://secure-web.cisco.com/1w-ldlm8bq5oYiMuHk7N1T32DW18VkxjnfkMWDjdpiBv1WJToz9PCO1zVyGvWIVP3-fNVoZ49ioTlOwQoRbyC_MjpoBPlD3jfpV_knuzViyRNZiyliSGH9rx5nGVvTLSPrjIwzvUIZDadCuNXgM_jYCVBE2RsDpg8o4LCjJv9QIZPbyHlKrkoQ0sNGXOZPYT7gxpo8sVjoxKQbOgQzkDnPMQoa2a8miTP19fLkB5HqV5cJv3U-Vs_qLtyJGVsrSgLu2wQoDMxymVwm5mcRWO6MYfl4_MtVXKzQRwQqemODDjSa5my7zl98vobN_ui-cRwCYeVbOwEd57CjaYRzKcBu6Dbd2TmGar7JUNWVtg1dZPTv6uothD6V4g0Q0MuXZsBICzfxbjXI9WlB3Tiu3ty0oxenYrM8yxE-Cl57VhmV4KlY18EHMFncfLtRkk9cTHtfrEjiXBROhCuvEeqhrYT6A/http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://secure-web.cisco.com/1w-ldlm8bq5oYiMuHk7N1T32DW18VkxjnfkMWDjdpiBv1WJToz9PCO1zVyGvWIVP3-fNVoZ49ioTlOwQoRbyC_MjpoBPlD3jfpV_knuzViyRNZiyliSGH9rx5nGVvTLSPrjIwzvUIZDadCuNXgM_jYCVBE2RsDpg8o4LCjJv9QIZPbyHlKrkoQ0sNGXOZPYT7gxpo8sVjoxKQbOgQzkDnPMQoa2a8miTP19fLkB5HqV5cJv3U-Vs_qLtyJGVsrSgLu2wQoDMxymVwm5mcRWO6MYfl4_MtVXKzQRwQqemODDjSa5my7zl98vobN_ui-cRwCYeVbOwEd57CjaYRzKcBu6Dbd2TmGar7JUNWVtg1dZPTv6uothD6V4g0Q0MuXZsBICzfxbjXI9WlB3Tiu3ty0oxenYrM8yxE-Cl57VhmV4KlY18EHMFncfLtRkk9cTHtfrEjiXBROhCuvEeqhrYT6A/http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://secure-web.cisco.com/1w-ldlm8bq5oYiMuHk7N1T32DW18VkxjnfkMWDjdpiBv1WJToz9PCO1zVyGvWIVP3-fNVoZ49ioTlOwQoRbyC_MjpoBPlD3jfpV_knuzViyRNZiyliSGH9rx5nGVvTLSPrjIwzvUIZDadCuNXgM_jYCVBE2RsDpg8o4LCjJv9QIZPbyHlKrkoQ0sNGXOZPYT7gxpo8sVjoxKQbOgQzkDnPMQoa2a8miTP19fLkB5HqV5cJv3U-Vs_qLtyJGVsrSgLu2wQoDMxymVwm5mcRWO6MYfl4_MtVXKzQRwQqemODDjSa5my7zl98vobN_ui-cRwCYeVbOwEd57CjaYRzKcBu6Dbd2TmGar7JUNWVtg1dZPTv6uothD6V4g0Q0MuXZsBICzfxbjXI9WlB3Tiu3ty0oxenYrM8yxE-Cl57VhmV4KlY18EHMFncfLtRkk9cTHtfrEjiXBROhCuvEeqhrYT6A/http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss From ewahl at osc.edu Wed Sep 20 20:07:39 2017 From: ewahl at osc.edu (Edward Wahl) Date: Wed, 20 Sep 2017 15:07:39 -0400 Subject: [gpfsug-discuss] CCR cluster down for the count? In-Reply-To: <28D10363-A8F3-439B-81DB-EB0E4E750FFD@vanderbilt.edu> References: <20170920114844.6bf9f27b@osc.edu> <28D10363-A8F3-439B-81DB-EB0E4E750FFD@vanderbilt.edu> Message-ID: <20170920150739.39f0a4a0@osc.edu> So who was the ccrmaster before? What is/was the quorum config? (tiebreaker disks?) what does 'mmccr check' say? Have you set DEBUG=1 and tried mmstartup to see if it teases out any more info from the error? Ed On Wed, 20 Sep 2017 16:27:48 +0000 "Buterbaugh, Kevin L" wrote: > Hi Ed, > > Thanks for the suggestion ? that?s basically what I had done yesterday after > Googling and getting a hit or two on the IBM DeveloperWorks site. I?m > including some output below which seems to show that I?ve got everything set > up but it?s still not working. > > Am I missing something? We don?t use CCR on our production cluster (and this > experience doesn?t make me eager to do so!), so I?m not that familiar with > it... > > Kevin > > /var/mmfs/gen > root at testnsd2# mmdsh -F /tmp/cluster.hostnames "ps -ef | grep mmccr | grep -v > grep" | sort testdellnode1: root 2583 1 0 May30 ? > 00:10:33 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testdellnode1: root 6694 2583 0 11:19 ? > 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testgateway: root 2023 5828 0 11:19 ? > 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testgateway: root 5828 1 0 Sep18 ? > 00:00:19 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd1: > root 19356 4628 0 11:19 tty1 > 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd1: > root 4628 1 0 Sep19 tty1 > 00:00:04 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd2: > root 22149 2983 0 11:16 ? > 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd2: > root 2983 1 0 Sep18 ? > 00:00:27 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd3: > root 15685 6557 0 11:19 ? > 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd3: > root 6557 1 0 Sep19 ? > 00:00:04 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testsched: root 29424 6512 0 11:19 ? > 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 > testsched: root 6512 1 0 Sep18 ? > 00:00:20 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor > 15 /var/mmfs/gen root at testnsd2# mmstartup -a get file failed: Not enough CCR > quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr > fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous > error messages to determine cause. /var/mmfs/gen root at testnsd2# mmdsh > -F /tmp/cluster.hostnames "ls -l /var/mmfs/ccr" | sort testdellnode1: > drwxr-xr-x 2 root root 4096 Mar 3 2017 cached testdellnode1: drwxr-xr-x 2 > root root 4096 Nov 10 2016 committed testdellnode1: -rw-r--r-- 1 root > root 99 Nov 10 2016 ccr.nodes testdellnode1: total 12 testgateway: > drwxr-xr-x. 2 root root 4096 Jun 29 2016 committed testgateway: drwxr-xr-x. > 2 root root 4096 Mar 3 2017 cached testgateway: -rw-r--r--. 1 root root > 99 Jun 29 2016 ccr.nodes testgateway: total 12 testnsd1: drwxr-xr-x 2 root > root 6 Sep 19 15:38 cached testnsd1: drwxr-xr-x 2 root root 6 Sep 19 15:38 > committed testnsd1: -rw-r--r-- 1 root root 0 Sep 19 15:39 ccr.disks > testnsd1: -rw-r--r-- 1 root root 4 Sep 19 15:38 ccr.noauth testnsd1: > -rw-r--r-- 1 root root 99 Sep 19 15:39 ccr.nodes testnsd1: total 8 > testnsd2: drwxr-xr-x 2 root root 22 Mar 3 2017 cached testnsd2: > drwxr-xr-x 2 root root 4096 Sep 18 11:49 committed testnsd2: -rw------- 1 > root root 4096 Sep 18 11:50 ccr.paxos.1 testnsd2: -rw------- 1 root root > 4096 Sep 18 11:50 ccr.paxos.2 testnsd2: -rw-r--r-- 1 root root 0 Jun 29 > 2016 ccr.disks testnsd2: -rw-r--r-- 1 root root 99 Jun 29 2016 ccr.nodes > testnsd2: total 16 testnsd3: drwxr-xr-x 2 root root 6 Sep 19 15:41 cached > testnsd3: drwxr-xr-x 2 root root 6 Sep 19 15:41 committed testnsd3: > -rw-r--r-- 1 root root 0 Jun 29 2016 ccr.disks testnsd3: -rw-r--r-- 1 root > root 4 Sep 19 15:41 ccr.noauth testnsd3: -rw-r--r-- 1 root root 99 Jun 29 > 2016 ccr.nodes testnsd3: total 8 testsched: drwxr-xr-x. 2 root root 4096 > Jun 29 2016 committed testsched: drwxr-xr-x. 2 root root 4096 Mar 3 2017 > cached testsched: -rw-r--r--. 1 root root 99 Jun 29 2016 ccr.nodes > testsched: total 12 /var/mmfs/gen root at testnsd2# more ../ccr/ccr.nodes > 3,0,10.0.6.215,,testnsd3.vampire > 1,0,10.0.6.213,,testnsd1.vampire > 2,0,10.0.6.214,,testnsd2.vampire > /var/mmfs/gen > root at testnsd2# mmdsh -F /tmp/cluster.hostnames "ls -l /var/mmfs/gen/mmsdrfs" > testnsd1: -rw-r--r-- 1 root root 20360 Sep 19 15:21 /var/mmfs/gen/mmsdrfs > testnsd3: -rw-r--r-- 1 root root 20360 Sep 19 15:34 /var/mmfs/gen/mmsdrfs > testnsd2: -rw-r--r-- 1 root root 20360 Aug 25 17:34 /var/mmfs/gen/mmsdrfs > testdellnode1: -rw-r--r-- 1 root root 20360 Aug 25 > 17:43 /var/mmfs/gen/mmsdrfs testgateway: -rw-r--r--. 1 root root 20360 Aug > 25 17:43 /var/mmfs/gen/mmsdrfs testsched: -rw-r--r--. 1 root root 20360 Aug > 25 17:43 /var/mmfs/gen/mmsdrfs /var/mmfs/gen > root at testnsd2# mmdsh -F /tmp/cluster.hostnames "md5sum /var/mmfs/gen/mmsdrfs" > testnsd1: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs > testnsd3: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs > testnsd2: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs > testdellnode1: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs > testgateway: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs > testsched: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs > /var/mmfs/gen > root at testnsd2# mmdsh -F /tmp/cluster.hostnames > "md5sum /var/mmfs/ssl/stage/genkeyData1" testnsd3: > ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 testnsd1: > ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 testnsd2: > ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 > testdellnode1: > ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 > testgateway: > ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 testsched: > ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 /var/mmfs/gen > root at testnsd2# > > On Sep 20, 2017, at 10:48 AM, Edward Wahl > > wrote: > > I've run into this before. We didn't use to use CCR. And restoring nodes for > us is a major pain in the rear as we only allow one-way root SSH, so we have a > number of useful little scripts to work around problems like this. > > Assuming that you have all the necessary files copied to the correct > places, you can manually kick off CCR. > > I think my script does something like: > > (copy the encryption key info) > > scp /var/mmfs/ccr/ccr.nodes :/var/mmfs/ccr/ > > scp /var/mmfs/gen/mmsdrfs :/var/mmfs/gen/ > > scp /var/mmfs/ssl/stage/genkeyData1 :/var/mmfs/ssl/stage/ > > :/usr/lpp/mmfs/bin/mmcommon startCcrMonitor > > you should then see like 2 copies of it running under mmksh. > > Ed > > > On Wed, 20 Sep 2017 13:55:28 +0000 > "Buterbaugh, Kevin L" > > > wrote: > > Hi All, > > testnsd1 and testnsd3 both had hardware issues (power supply and internal HD > respectively). Given that they were 12 year old boxes, we decided to replace > them with other boxes that are a mere 7 years old ? keep in mind that this is > a test cluster. > > Disabling CCR does not work, even with the undocumented ??force? option: > > /var/mmfs/gen > root at testnsd2# mmchcluster --ccr-disable -p testnsd2 -s testnsd1 --force > mmchcluster: Unable to obtain the GPFS configuration file lock. > mmchcluster: GPFS was unable to obtain a lock from node testnsd1.vampire. > mmchcluster: Processing continues without lock protection. > The authenticity of host 'testnsd3.vampire (10.0.6.215)' can't be established. > ECDSA key fingerprint is SHA256:Ky1pkjsC/kvt4RA8PJuEh/W3vcxCJZplr2m1XHr+UwI. > ECDSA key fingerprint is MD5:55:59:a0:2a:6e:a1:00:58:85:3d:ac:86:0e:cd:2a:8a. > Are you sure you want to continue connecting (yes/no)? The authenticity of > host 'testnsd1.vampire (10.0.6.213)' can't be established. ECDSA key > fingerprint is SHA256:WPiTtyuyzhuv+lRRpgDjLuHpyHyk/W3+c5N9SabWvnE. ECDSA key > fingerprint is MD5:26:26:2a:bf:e4:cb:1d:a8:27:35:96:ef:b5:96:e0:29. Are you > sure you want to continue connecting (yes/no)? The authenticity of host > 'vmp609.vampire (10.0.21.9)' can't be established. ECDSA key fingerprint is > SHA256:/gX6eSp/shsRboVFcUFcNCtGSfbBIWQZ/CWjA6gb17Q. ECDSA key fingerprint is > MD5:ca:4d:58:8c:91:28:25:7b:5b:b1:0d:a3:72:a3:00:bb. Are you sure you want to > continue connecting (yes/no)? The authenticity of host 'vmp608.vampire > (10.0.21.8)' can't be established. ECDSA key fingerprint is > SHA256:tvtNWN9b7/Qknb/Am8x7FzyMngi6R3f5SHBqATNtLzw. ECDSA key fingerprint is > MD5:fc:4e:87:fb:09:82:cd:67:b0:7d:7f:c7:4b:83:b9:6c. Are you sure you want to > continue connecting (yes/no)? The authenticity of host 'vmp612.vampire > (10.0.21.12)' can't be established. ECDSA key fingerprint is > SHA256:zKXqPt8rIMZWSAYavKEuaAVIm31OGVovoWVU+dBTRPM. ECDSA key fingerprint is > MD5:72:4d:fb:22:4e:b3:0e:04:37:be:16:74:ae:ea:05:6c. Are you sure you want to > continue connecting (yes/no)? > root at vmp610.vampire's > password: testnsd3.vampire: Host key verification failed. mmdsh: > testnsd3.vampire remote shell process had return code 255. testnsd1.vampire: > Host key verification failed. mmdsh: testnsd1.vampire remote shell process > had return code 255. vmp609.vampire: Host key verification failed. mmdsh: > vmp609.vampire remote shell process had return code 255. vmp608.vampire: > Host key verification failed. mmdsh: vmp608.vampire remote shell process had > return code 255. vmp612.vampire: Host key verification failed. mmdsh: > vmp612.vampire remote shell process had return code 255. > > root at vmp610.vampire's > password: vmp610.vampire: Permission denied, please try again. > > root at vmp610.vampire's > password: vmp610.vampire: Permission denied, please try again. > > vmp610.vampire: Permission denied > (publickey,gssapi-keyex,gssapi-with-mic,password). mmdsh: vmp610.vampire > remote shell process had return code 255. > > Verifying GPFS is stopped on all nodes ... > The authenticity of host 'testnsd3.vampire (10.0.6.215)' can't be established. > ECDSA key fingerprint is SHA256:Ky1pkjsC/kvt4RA8PJuEh/W3vcxCJZplr2m1XHr+UwI. > ECDSA key fingerprint is MD5:55:59:a0:2a:6e:a1:00:58:85:3d:ac:86:0e:cd:2a:8a. > Are you sure you want to continue connecting (yes/no)? The authenticity of > host 'vmp612.vampire (10.0.21.12)' can't be established. ECDSA key > fingerprint is SHA256:zKXqPt8rIMZWSAYavKEuaAVIm31OGVovoWVU+dBTRPM. ECDSA key > fingerprint is MD5:72:4d:fb:22:4e:b3:0e:04:37:be:16:74:ae:ea:05:6c. Are you > sure you want to continue connecting (yes/no)? The authenticity of host > 'vmp608.vampire (10.0.21.8)' can't be established. ECDSA key fingerprint is > SHA256:tvtNWN9b7/Qknb/Am8x7FzyMngi6R3f5SHBqATNtLzw. ECDSA key fingerprint is > MD5:fc:4e:87:fb:09:82:cd:67:b0:7d:7f:c7:4b:83:b9:6c. Are you sure you want to > continue connecting (yes/no)? The authenticity of host 'vmp609.vampire > (10.0.21.9)' can't be established. ECDSA key fingerprint is > SHA256:/gX6eSp/shsRboVFcUFcNCtGSfbBIWQZ/CWjA6gb17Q. ECDSA key fingerprint is > MD5:ca:4d:58:8c:91:28:25:7b:5b:b1:0d:a3:72:a3:00:bb. Are you sure you want to > continue connecting (yes/no)? The authenticity of host 'testnsd1.vampire > (10.0.6.213)' can't be established. ECDSA key fingerprint is > SHA256:WPiTtyuyzhuv+lRRpgDjLuHpyHyk/W3+c5N9SabWvnE. ECDSA key fingerprint is > MD5:26:26:2a:bf:e4:cb:1d:a8:27:35:96:ef:b5:96:e0:29. Are you sure you want to > continue connecting (yes/no)? > root at vmp610.vampire's > password: > root at vmp610.vampire's > password: > root at vmp610.vampire's > password: > > testnsd3.vampire: Host key verification failed. > mmdsh: testnsd3.vampire remote shell process had return code 255. > vmp612.vampire: Host key verification failed. > mmdsh: vmp612.vampire remote shell process had return code 255. > vmp608.vampire: Host key verification failed. > mmdsh: vmp608.vampire remote shell process had return code 255. > vmp609.vampire: Host key verification failed. > mmdsh: vmp609.vampire remote shell process had return code 255. > testnsd1.vampire: Host key verification failed. > mmdsh: testnsd1.vampire remote shell process had return code 255. > vmp610.vampire: Permission denied, please try again. > vmp610.vampire: Permission denied, please try again. > vmp610.vampire: Permission denied > (publickey,gssapi-keyex,gssapi-with-mic,password). mmdsh: vmp610.vampire > remote shell process had return code 255. mmchcluster: Command failed. > Examine previous error messages to determine cause. /var/mmfs/gen > root at testnsd2# > > I believe that part of the problem may be that there are 4 client nodes that > were removed from the cluster without removing them from the cluster (done by > another SysAdmin who was in a hurry to repurpose those machines). They?re up > and pingable but not reachable by GPFS anymore, which I?m pretty sure is > making things worse. > > Nor does Loic?s suggestion of running mmcommon work (but thanks for the > suggestion!) ? actually the mmcommon part worked, but a subsequent attempt to > start the cluster up failed: > > /var/mmfs/gen > root at testnsd2# mmstartup -a > get file failed: Not enough CCR quorum nodes available (err 809) > gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 > mmstartup: Command failed. Examine previous error messages to determine cause. > /var/mmfs/gen > root at testnsd2# > > Thanks. > > Kevin > > On Sep 19, 2017, at 10:07 PM, IBM Spectrum Scale > > wrote: > > > Hi Kevin, > > Let's me try to understand the problem you have. What's the meaning of node > died here. Are you mean that there are some hardware/OS issue which cannot be > fixed and OS cannot be up anymore? > > I agree with Bob that you can have a try to disable CCR temporally, restore > cluster configuration and enable it again. > > Such as: > > 1. Login to a node which has proper GPFS config, e.g NodeA > 2. Shutdown daemon in all client cluster. > 3. mmchcluster --ccr-disable -p NodeA > 4. mmsdrrestore -a -p NodeA > 5. mmauth genkey propagate -N testnsd1, testnsd3 > 6. mmchcluster --ccr-enable > > Regards, The Spectrum Scale (GPFS) team > > ------------------------------------------------------------------------------------------------------------------ > If you feel that your question can benefit other users of Spectrum Scale > (GPFS), then please post it to the public IBM developerWroks Forum at > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fdeveloperworks%2Fcommunity%2Fforums%2Fhtml%2Fforum%3Fid%3D11111111-0000-0000-0000-000000000479&data=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C745cfeaac7264124bb8c08d5003f162a%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636415193316350738&sdata=8OL9COHsb4M%2BZOyWta92acdO8K1Ez8HJfHbrCdDsmRs%3D&reserved=0. > > If your query concerns a potential software error in Spectrum Scale (GPFS) > and you have an IBM software maintenance contract please contact > 1-800-237-5511 in the United States or your local IBM Service Center in other > countries. > > The forum is informally monitored as time permits and should not be used for > priority messages to the Spectrum Scale (GPFS) team. > > "Oesterlin, Robert" ---09/20/2017 07:39:55 AM---OK ? I?ve run > across this before, and it?s because of a bug (as I recall) having to do with > CCR and > > From: "Oesterlin, Robert" > > > To: gpfsug main discussion list > > > Date: 09/20/2017 07:39 AM Subject: Re: [gpfsug-discuss] CCR cluster down for > the count? Sent by: > gpfsug-discuss-bounces at spectrumscale.org > > ________________________________ > > > > OK ? I?ve run across this before, and it?s because of a bug (as I recall) > having to do with CCR and quorum. What I think you can do is set the cluster > to non-ccr (mmchcluster ?ccr-disable) with all the nodes down, bring it back > up and then re-enable ccr. > > I?ll see if I can find this in one of the recent 4.2 release nodes. > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > > From: > > > on behalf of "Buterbaugh, Kevin L" > > > Reply-To: gpfsug main discussion list > > > Date: Tuesday, September 19, 2017 at 4:03 PM To: gpfsug main discussion list > > > Subject: [EXTERNAL] [gpfsug-discuss] CCR cluster down for the count? > > Hi All, > > We have a small test cluster that is CCR enabled. It only had/has 3 NSD > servers (testnsd1, 2, and 3) and maybe 3-6 clients. testnsd3 died a while > back. I did nothing about it at the time because it was due to be life-cycled > as soon as I finished a couple of higher priority projects. > > Yesterday, testnsd1 also died, which took the whole cluster down. So now > resolving this has become higher priority? ;-) > > I took two other boxes and set them up as testnsd1 and 3, respectively. I?ve > done a ?mmsdrrestore -p testnsd2 -R /usr/bin/scp? on both of them. I?ve also > done a "mmccr setup -F? and copied the ccr.disks and ccr.nodes files from > testnsd2 to them. And I?ve copied /var/mmfs/gen/mmsdrfs from testnsd2 to > testnsd1 and 3. In case it?s not obvious from the above, networking is fine ? > ssh without a password between those 3 boxes is fine. > > However, when I try to startup GPFS ? or run any GPFS command I get: > > /root > root at testnsd2# mmstartup -a > get file failed: Not enough CCR quorum nodes available (err 809) > gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 > mmstartup: Command failed. Examine previous error messages to determine cause. > /root > root at testnsd2# > > I?ve got to run to a meeting right now, so I hope I?m not leaving out any > crucial details here ? does anyone have an idea what I need to do? Thanks? > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu > - (615)875-9633 > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at > spectrumscale.org > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttp-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss%26d%3DDwICAg%26c%3Djf_iaSHvJObTbx-siA1ZOg%26r%3DIbxtjdkPAM2Sbon4Lbbi4w%26m%3DmBSa534LB4C2zN59ZsJSlginQqfcrutinpAPYNDqU_Y%26s%3DYJEapknqzE2d9kwZzZuu6gEW0DzBoM-o94pXGEeCfuI%26e&data=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C745cfeaac7264124bb8c08d5003f162a%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636415193316350738&sdata=oQ4u%2BdyyYLY7HzaOqRPEGjUVhi7AQF%2BvbvnWA4bhuXE%3D&reserved=0= > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at > spectrumscale.org > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C494f0469ec084568b39608d4ffd4b8c2%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636414736486816768&sdata=kBvEL7Kp2JMGuLIL4NX3UV7h3emaayQSbHr8O1F2CXc%3D&reserved=0 > > > > > -- > > Ed Wahl > Ohio Supercomputer Center > 614-292-9302 > > > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and Education > Kevin.Buterbaugh at vanderbilt.edu - > (615)875-9633 > > > -- Ed Wahl Ohio Supercomputer Center 614-292-9302 From tarak.patel at canada.ca Wed Sep 20 21:23:00 2017 From: tarak.patel at canada.ca (Patel, Tarak (SSC/SPC)) Date: Wed, 20 Sep 2017 20:23:00 +0000 Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication In-Reply-To: References: <27469.1500914134@turing-police.cc.vt.edu> , , Message-ID: Hi, Recently we deployed 3 sets of CES nodes where we are using LDAP for authentication service. We had to create a user in ldap which was used by 'mmuserauth service create' command. Note that SMB needs to be disabled ('mmces service disable smb') if not being used before issuing 'mmuserauth service create'. By default, CES deployment enables SMB (' spectrumscale config protocols'). Tarak -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jonathon A Anderson Sent: 20 September, 2017 14:55 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication I shouldn't need SMB for authentication if I'm only using userdefined authentication, though. ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Sobey, Richard A Sent: Wednesday, September 20, 2017 2:23:37 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication This sounded familiar to a problem I had to do with SMB and NFS. I've looked, and it's a different problem, but at the time I had this response. "That would be the case when Active Directory is configured for authentication. In that case the SMB service includes two aspects: One is the actual SMB file server, and the second one is the service for the Active Directory integration. Since NFS depends on authentication and id mapping services, it requires SMB to be running." I suspect the last paragraph is relevant in your case. HTH Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jonathon A Anderson Sent: 20 September 2017 06:13 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Returning to this thread cause I'm having the same issue as Ilan, above. I'm working on setting up CES in our environment after finally getting a blocking bugfix applied. I'm making it further now, but I'm getting an error when I try to create my export: --- [root at sgate2 ~]# mmnfs export add /gpfs/summit/scratch --client 'login*.rc.int.colorado.edu(rw,root_squash);dtn*.rc.int.colorado.edu(rw,root_squash)' mmcesfuncs.sh: Current authentication: none is invalid. This operation can not be completed without correct Authentication configuration. Configure authentication using: mmuserauth mmnfs export add: Command failed. Examine previous error messages to determine cause. --- When I try to configure mmuserauth, I get an error about not having SMB active; but I don't want to configure SMB, only NFS. --- [root at sgate2 ~]# /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. --- How can I configure NFS exports with mmnfs without having to enable SMB? ~jonathon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Varun Mittal3 Sent: Tuesday, July 25, 2017 9:44:24 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sorry a small typo: mmuserauth service create --data-access-method file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune [Inactive hide details for Varun Mittal3---26/07/2017 09:12:27 AM---Hi Did you try to run this command from a CES designated nod]Varun Mittal3---26/07/2017 09:12:27 AM---Hi Did you try to run this command from a CES designated node ? From: Varun Mittal3/India/IBM To: gpfsug main discussion list Date: 26/07/2017 09:12 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication ________________________________ Hi Did you try to run this command from a CES designated node ? If no, then try executing the command from a CES node: mmuserauth service create --data-access-type file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune [Inactive hide details for Ilan Schwarts ---25/07/2017 10:22:26 AM---Hi, While trying to add the userdefined auth, I receive err]Ilan Schwarts ---25/07/2017 10:22:26 AM---Hi, While trying to add the userdefined auth, I receive error that SMB From: Ilan Schwarts To: gpfsug main discussion list Date: 25/07/2017 10:22 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi, While trying to add the userdefined auth, I receive error that SMB service not enabled. I am currently working on a spectrum scale cluster, and i dont have the SMB package, I am waiting for it.. is there a way to export NFSv3 using the spectrum scale tools without SMB package ? [root at LH20-GPFS1 ~]# mmuserauth service create --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. I exported the NFS via /etc/exports and than ./exportfs -a .. It works fine, I was able to mount the gpfs export from another machine.. this was my work-around since the spectrum scale tools failed to export NFSv3 On Mon, Jul 24, 2017 at 7:35 PM, wrote: > On Mon, 24 Jul 2017 13:36:41 +0300, Ilan Schwarts said: >> Hi, >> I have gpfs with 2 Nodes (redhat). >> I am trying to create NFS share - So I would be able to mount and >> access it from another linux machine. > >> While trying to create NFS (I execute the following): >> [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "* >> Access_Type=RW,Protocols=3:4,Squash=no_root_squash)" > > You can get away with little to no authentication for NFSv3, but not > for NFSv4. Try with Protocols=3 only and > > mmuserauth service create --type userdefined > > that should get you Unix-y NFSv3 UID/GID support and "trust what the > NFS client tells you". This of course only works sanely if each NFS > export is only to a set of machines in the same administrative domain > that manages their UID/GIDs. Exporting to two sets of machines that > don't coordinate their UID/GID space is, of course, where hilarity and hijinks ensue.... > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://secure-web.cisco.com/1w-ldlm8bq5oYiMuHk7N1T32DW18VkxjnfkMWDjdpi > Bv1WJToz9PCO1zVyGvWIVP3-fNVoZ49ioTlOwQoRbyC_MjpoBPlD3jfpV_knuzViyRNZiy > liSGH9rx5nGVvTLSPrjIwzvUIZDadCuNXgM_jYCVBE2RsDpg8o4LCjJv9QIZPbyHlKrkoQ > 0sNGXOZPYT7gxpo8sVjoxKQbOgQzkDnPMQoa2a8miTP19fLkB5HqV5cJv3U-Vs_qLtyJGV > srSgLu2wQoDMxymVwm5mcRWO6MYfl4_MtVXKzQRwQqemODDjSa5my7zl98vobN_ui-cRwC > YeVbOwEd57CjaYRzKcBu6Dbd2TmGar7JUNWVtg1dZPTv6uothD6V4g0Q0MuXZsBICzfxbj > XI9WlB3Tiu3ty0oxenYrM8yxE-Cl57VhmV4KlY18EHMFncfLtRkk9cTHtfrEjiXBROhCuv > EeqhrYT6A/http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discus > s > -- - Ilan Schwarts _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://secure-web.cisco.com/1w-ldlm8bq5oYiMuHk7N1T32DW18VkxjnfkMWDjdpiBv1WJToz9PCO1zVyGvWIVP3-fNVoZ49ioTlOwQoRbyC_MjpoBPlD3jfpV_knuzViyRNZiyliSGH9rx5nGVvTLSPrjIwzvUIZDadCuNXgM_jYCVBE2RsDpg8o4LCjJv9QIZPbyHlKrkoQ0sNGXOZPYT7gxpo8sVjoxKQbOgQzkDnPMQoa2a8miTP19fLkB5HqV5cJv3U-Vs_qLtyJGVsrSgLu2wQoDMxymVwm5mcRWO6MYfl4_MtVXKzQRwQqemODDjSa5my7zl98vobN_ui-cRwCYeVbOwEd57CjaYRzKcBu6Dbd2TmGar7JUNWVtg1dZPTv6uothD6V4g0Q0MuXZsBICzfxbjXI9WlB3Tiu3ty0oxenYrM8yxE-Cl57VhmV4KlY18EHMFncfLtRkk9cTHtfrEjiXBROhCuvEeqhrYT6A/http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://secure-web.cisco.com/1w-ldlm8bq5oYiMuHk7N1T32DW18VkxjnfkMWDjdpiBv1WJToz9PCO1zVyGvWIVP3-fNVoZ49ioTlOwQoRbyC_MjpoBPlD3jfpV_knuzViyRNZiyliSGH9rx5nGVvTLSPrjIwzvUIZDadCuNXgM_jYCVBE2RsDpg8o4LCjJv9QIZPbyHlKrkoQ0sNGXOZPYT7gxpo8sVjoxKQbOgQzkDnPMQoa2a8miTP19fLkB5HqV5cJv3U-Vs_qLtyJGVsrSgLu2wQoDMxymVwm5mcRWO6MYfl4_MtVXKzQRwQqemODDjSa5my7zl98vobN_ui-cRwCYeVbOwEd57CjaYRzKcBu6Dbd2TmGar7JUNWVtg1dZPTv6uothD6V4g0Q0MuXZsBICzfxbjXI9WlB3Tiu3ty0oxenYrM8yxE-Cl57VhmV4KlY18EHMFncfLtRkk9cTHtfrEjiXBROhCuvEeqhrYT6A/http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://secure-web.cisco.com/1w-ldlm8bq5oYiMuHk7N1T32DW18VkxjnfkMWDjdpiBv1WJToz9PCO1zVyGvWIVP3-fNVoZ49ioTlOwQoRbyC_MjpoBPlD3jfpV_knuzViyRNZiyliSGH9rx5nGVvTLSPrjIwzvUIZDadCuNXgM_jYCVBE2RsDpg8o4LCjJv9QIZPbyHlKrkoQ0sNGXOZPYT7gxpo8sVjoxKQbOgQzkDnPMQoa2a8miTP19fLkB5HqV5cJv3U-Vs_qLtyJGVsrSgLu2wQoDMxymVwm5mcRWO6MYfl4_MtVXKzQRwQqemODDjSa5my7zl98vobN_ui-cRwCYeVbOwEd57CjaYRzKcBu6Dbd2TmGar7JUNWVtg1dZPTv6uothD6V4g0Q0MuXZsBICzfxbjXI9WlB3Tiu3ty0oxenYrM8yxE-Cl57VhmV4KlY18EHMFncfLtRkk9cTHtfrEjiXBROhCuvEeqhrYT6A/http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From chetkulk at in.ibm.com Thu Sep 21 06:33:53 2017 From: chetkulk at in.ibm.com (Chetan R Kulkarni) Date: Thu, 21 Sep 2017 11:03:53 +0530 Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication In-Reply-To: References: <27469.1500914134@turing-police.cc.vt.edu>, , Message-ID: Hi Jonathon, I can configure file userdefined authentication with only NFS enabled/running on my test setup (SMB was disabled). Please check if following steps help fix your issue: 1> remove existing file auth if any /usr/lpp/mmfs/bin/mmuserauth service remove --data-access-method file 2> disable smb service /usr/lpp/mmfs/bin/mmces service disable smb /usr/lpp/mmfs/bin/mmces service list -a 3> configure userdefined file auth /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type userdefined 4> if above fails retry mmuserauth in debug mode as below and please share error log /tmp/userdefined.log. Also share spectrum scale version you are running with. export DEBUG=1; /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type userdefined > /tmp/userdefined.log 2>&1; unset DEBUG /usr/lpp/mmfs/bin/mmdiag --version 5> if mmuserauth succeeds in step 3> above; you also need to correct your mmnfs cli command as below. You missed to type in Access_Type= and Squash= in client definition. mmnfs export add /gpfs/summit/scratch --client 'login*.rc.int.colorado.edu (Access_Type=rw,Squash=root_squash);dtn*.rc.int.colorado.edu (Access_Type=rw,Squash=root_squash)' Thanks, Chetan. From: Jonathon A Anderson To: gpfsug main discussion list Date: 09/21/2017 12:25 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sent by: gpfsug-discuss-bounces at spectrumscale.org I shouldn't need SMB for authentication if I'm only using userdefined authentication, though. ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Sobey, Richard A Sent: Wednesday, September 20, 2017 2:23:37 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication This sounded familiar to a problem I had to do with SMB and NFS. I've looked, and it's a different problem, but at the time I had this response. "That would be the case when Active Directory is configured for authentication. In that case the SMB service includes two aspects: One is the actual SMB file server, and the second one is the service for the Active Directory integration. Since NFS depends on authentication and id mapping services, it requires SMB to be running." I suspect the last paragraph is relevant in your case. HTH Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jonathon A Anderson Sent: 20 September 2017 06:13 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Returning to this thread cause I'm having the same issue as Ilan, above. I'm working on setting up CES in our environment after finally getting a blocking bugfix applied. I'm making it further now, but I'm getting an error when I try to create my export: --- [root at sgate2 ~]# mmnfs export add /gpfs/summit/scratch --client 'login*.rc.int.colorado.edu(rw,root_squash);dtn*.rc.int.colorado.edu (rw,root_squash)' mmcesfuncs.sh: Current authentication: none is invalid. This operation can not be completed without correct Authentication configuration. Configure authentication using: mmuserauth mmnfs export add: Command failed. Examine previous error messages to determine cause. --- When I try to configure mmuserauth, I get an error about not having SMB active; but I don't want to configure SMB, only NFS. --- [root at sgate2 ~]# /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. --- How can I configure NFS exports with mmnfs without having to enable SMB? ~jonathon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Varun Mittal3 Sent: Tuesday, July 25, 2017 9:44:24 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sorry a small typo: mmuserauth service create --data-access-method file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune [Inactive hide details for Varun Mittal3---26/07/2017 09:12:27 AM---Hi Did you try to run this command from a CES designated nod]Varun Mittal3---26/07/2017 09:12:27 AM---Hi Did you try to run this command from a CES designated node ? From: Varun Mittal3/India/IBM To: gpfsug main discussion list Date: 26/07/2017 09:12 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication ________________________________ Hi Did you try to run this command from a CES designated node ? If no, then try executing the command from a CES node: mmuserauth service create --data-access-type file --type userdefined Best regards, Varun Mittal Cloud/Object Scrum @ Spectrum Scale ETZ, Pune [Inactive hide details for Ilan Schwarts ---25/07/2017 10:22:26 AM---Hi, While trying to add the userdefined auth, I receive err]Ilan Schwarts ---25/07/2017 10:22:26 AM---Hi, While trying to add the userdefined auth, I receive error that SMB From: Ilan Schwarts To: gpfsug main discussion list Date: 25/07/2017 10:22 AM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi, While trying to add the userdefined auth, I receive error that SMB service not enabled. I am currently working on a spectrum scale cluster, and i dont have the SMB package, I am waiting for it.. is there a way to export NFSv3 using the spectrum scale tools without SMB package ? [root at LH20-GPFS1 ~]# mmuserauth service create --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. I exported the NFS via /etc/exports and than ./exportfs -a .. It works fine, I was able to mount the gpfs export from another machine.. this was my work-around since the spectrum scale tools failed to export NFSv3 On Mon, Jul 24, 2017 at 7:35 PM, wrote: > On Mon, 24 Jul 2017 13:36:41 +0300, Ilan Schwarts said: >> Hi, >> I have gpfs with 2 Nodes (redhat). >> I am trying to create NFS share - So I would be able to mount and >> access it from another linux machine. > >> While trying to create NFS (I execute the following): >> [root at LH20-GPFS1 ~]# mmnfs export add /fs_gpfs01 -c "* >> Access_Type=RW,Protocols=3:4,Squash=no_root_squash)" > > You can get away with little to no authentication for NFSv3, but > not for NFSv4. Try with Protocols=3 only and > > mmuserauth service create --type userdefined > > that should get you Unix-y NFSv3 UID/GID support and "trust what the NFS > client tells you". This of course only works sanely if each NFS export is > only to a set of machines in the same administrative domain that manages their > UID/GIDs. Exporting to two sets of machines that don't coordinate their > UID/GID space is, of course, where hilarity and hijinks ensue.... > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__secure-2Dweb.cisco.com_1w-2Dldlm8bq5oYiMuHk7N1T32DW18VkxjnfkMWDjdpiBv1WJToz9PCO1zVyGvWIVP3-2DfNVoZ49ioTlOwQoRbyC-5FMjpoBPlD3jfpV-5FknuzViyRNZiyliSGH9rx5nGVvTLSPrjIwzvUIZDadCuNXgM-5FjYCVBE2RsDpg8o4LCjJv9QIZPbyHlKrkoQ0sNGXOZPYT7gxpo8sVjoxKQbOgQzkDnPMQoa2a8miTP19fLkB5HqV5cJv3U-2DVs-5FqLtyJGVsrSgLu2wQoDMxymVwm5mcRWO6MYfl4-5FMtVXKzQRwQqemODDjSa5my7zl98vobN-5Fui-2DcRwCYeVbOwEd57CjaYRzKcBu6Dbd2TmGar7JUNWVtg1dZPTv6uothD6V4g0Q0MuXZsBICzfxbjXI9WlB3Tiu3ty0oxenYrM8yxE-2DCl57VhmV4KlY18EHMFncfLtRkk9cTHtfrEjiXBROhCuvEeqhrYT6A_http-253A-252F-252Fgpfsug.org-252Fmailman-252Flistinfo-252Fgpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=uic-29lyJ5TCiTRi0FyznYhKJx5I7Vzu80WyYuZ4_iM&m=VqyIekg3Wtz0ukw-QSXsEXOoi5rZ0gnMeIPyFNGpllA&s=DNgplGZ30awqnvnd4Ju39pzv3rlk18Kf6NGe7iDX4Mk&e= > -- - Ilan Schwarts _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__secure-2Dweb.cisco.com_1w-2Dldlm8bq5oYiMuHk7N1T32DW18VkxjnfkMWDjdpiBv1WJToz9PCO1zVyGvWIVP3-2DfNVoZ49ioTlOwQoRbyC-5FMjpoBPlD3jfpV-5FknuzViyRNZiyliSGH9rx5nGVvTLSPrjIwzvUIZDadCuNXgM-5FjYCVBE2RsDpg8o4LCjJv9QIZPbyHlKrkoQ0sNGXOZPYT7gxpo8sVjoxKQbOgQzkDnPMQoa2a8miTP19fLkB5HqV5cJv3U-2DVs-5FqLtyJGVsrSgLu2wQoDMxymVwm5mcRWO6MYfl4-5FMtVXKzQRwQqemODDjSa5my7zl98vobN-5Fui-2DcRwCYeVbOwEd57CjaYRzKcBu6Dbd2TmGar7JUNWVtg1dZPTv6uothD6V4g0Q0MuXZsBICzfxbjXI9WlB3Tiu3ty0oxenYrM8yxE-2DCl57VhmV4KlY18EHMFncfLtRkk9cTHtfrEjiXBROhCuvEeqhrYT6A_http-253A-252F-252Fgpfsug.org-252Fmailman-252Flistinfo-252Fgpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=uic-29lyJ5TCiTRi0FyznYhKJx5I7Vzu80WyYuZ4_iM&m=VqyIekg3Wtz0ukw-QSXsEXOoi5rZ0gnMeIPyFNGpllA&s=DNgplGZ30awqnvnd4Ju39pzv3rlk18Kf6NGe7iDX4Mk&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__secure-2Dweb.cisco.com_1w-2Dldlm8bq5oYiMuHk7N1T32DW18VkxjnfkMWDjdpiBv1WJToz9PCO1zVyGvWIVP3-2DfNVoZ49ioTlOwQoRbyC-5FMjpoBPlD3jfpV-5FknuzViyRNZiyliSGH9rx5nGVvTLSPrjIwzvUIZDadCuNXgM-5FjYCVBE2RsDpg8o4LCjJv9QIZPbyHlKrkoQ0sNGXOZPYT7gxpo8sVjoxKQbOgQzkDnPMQoa2a8miTP19fLkB5HqV5cJv3U-2DVs-5FqLtyJGVsrSgLu2wQoDMxymVwm5mcRWO6MYfl4-5FMtVXKzQRwQqemODDjSa5my7zl98vobN-5Fui-2DcRwCYeVbOwEd57CjaYRzKcBu6Dbd2TmGar7JUNWVtg1dZPTv6uothD6V4g0Q0MuXZsBICzfxbjXI9WlB3Tiu3ty0oxenYrM8yxE-2DCl57VhmV4KlY18EHMFncfLtRkk9cTHtfrEjiXBROhCuvEeqhrYT6A_http-253A-252F-252Fgpfsug.org-252Fmailman-252Flistinfo-252Fgpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=uic-29lyJ5TCiTRi0FyznYhKJx5I7Vzu80WyYuZ4_iM&m=VqyIekg3Wtz0ukw-QSXsEXOoi5rZ0gnMeIPyFNGpllA&s=DNgplGZ30awqnvnd4Ju39pzv3rlk18Kf6NGe7iDX4Mk&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__secure-2Dweb.cisco.com_1w-2Dldlm8bq5oYiMuHk7N1T32DW18VkxjnfkMWDjdpiBv1WJToz9PCO1zVyGvWIVP3-2DfNVoZ49ioTlOwQoRbyC-5FMjpoBPlD3jfpV-5FknuzViyRNZiyliSGH9rx5nGVvTLSPrjIwzvUIZDadCuNXgM-5FjYCVBE2RsDpg8o4LCjJv9QIZPbyHlKrkoQ0sNGXOZPYT7gxpo8sVjoxKQbOgQzkDnPMQoa2a8miTP19fLkB5HqV5cJv3U-2DVs-5FqLtyJGVsrSgLu2wQoDMxymVwm5mcRWO6MYfl4-5FMtVXKzQRwQqemODDjSa5my7zl98vobN-5Fui-2DcRwCYeVbOwEd57CjaYRzKcBu6Dbd2TmGar7JUNWVtg1dZPTv6uothD6V4g0Q0MuXZsBICzfxbjXI9WlB3Tiu3ty0oxenYrM8yxE-2DCl57VhmV4KlY18EHMFncfLtRkk9cTHtfrEjiXBROhCuvEeqhrYT6A_http-253A-252F-252Fgpfsug.org-252Fmailman-252Flistinfo-252Fgpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=uic-29lyJ5TCiTRi0FyznYhKJx5I7Vzu80WyYuZ4_iM&m=VqyIekg3Wtz0ukw-QSXsEXOoi5rZ0gnMeIPyFNGpllA&s=DNgplGZ30awqnvnd4Ju39pzv3rlk18Kf6NGe7iDX4Mk&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=uic-29lyJ5TCiTRi0FyznYhKJx5I7Vzu80WyYuZ4_iM&m=VqyIekg3Wtz0ukw-QSXsEXOoi5rZ0gnMeIPyFNGpllA&s=AliY037R_W1y8Ym6nPI1XDP2yCq47JwtTPhj9IppwOM&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From andreas.mattsson at maxiv.lu.se Thu Sep 21 13:09:29 2017 From: andreas.mattsson at maxiv.lu.se (Andreas Mattsson) Date: Thu, 21 Sep 2017 12:09:29 +0000 Subject: [gpfsug-discuss] Strange timestamp behaviour on NFS via CES In-Reply-To: References: , Message-ID: <10541a8ed07149ecafdbe9ac03b807b8@maxiv.lu.se> Since I solved this old issue a long time ago, I'd thought I'd come back and report the solution in case someone else encounters similar problems in the future. Original problem reported by users: Copying files between folders on NFS exports from a CES server gave random timestamps on the files. Also, apart from the initial reported problem, there where issues where users sometimes couldn't change or delete files that they where owners of. Background: We have a Active Directory with RFC2307 posix attributes populated, and use the built in Winbind-based AD authentication with RFC2307 ID mapping of our Spectrum Scale CES protocol servers. All our Linux clients and servers are also AD integrated, using Nslcd and nss-pam-ldapd. Trigger: If a user was part of a AD group with a mixed case name, and this group gave access to a folder, and the NFS mount was done using NFSv4, the behavior in my original post occurred when copying or changing files in that folder. Cause: Active Directory handle LDAP-requests case insensitive, but results are returned with case retained. Winbind and SSSD-AD converts groups and usernames to lower case. Nslcd retains case. We run NFS with managed GIDs. Managed GIDs in NFSv3 seems to be handled case insensitive, or to ignore the actual group name after it has resolved the GID-number of the group, while NFSv4 seems to handle group names case sensitive and check the actual group name for certain operations even if the GID-number matches. Don't fully understand the mechanism behind why certain file operations would work but others not, but in essence a user would be part of a group called "UserGroup" with GID-number 1234 in AD and on the client, but would be part of a group called "usergroup" with GID-number 1234 on the CES server. Any operation that's authorized on the GID-number, or a case insensitive lookup of the group name, would work. Any operation authorized by a case sensitive group lookup would fail. Three different workarounds where found to work: 1. Rename groups and users to lower case in AD 2. Change from Nslcd to either SSSD or Winbind on the clients 3. Change from NFSv4 to NFSv3 when mounting NFS Remember to clear ID-mapping caches. Regards, Andreas ___________________________________ [https://mail.google.com/mail/u/0/?ui=2&ik=b0a6f02971&view=att&th=14618fab2daf0e10&attid=0.1.1&disp=emb&zw&atsh=1] Andreas Mattsson System Engineer MAX IV Laboratory Lund University Tel: +46-706-649544 E-mail: andreas.mattsson at maxlab.lu.se ________________________________ Fr?n: gpfsug-discuss-bounces at spectrumscale.org f?r Stephen Ulmer Skickat: den 3 februari 2017 14:35:21 Till: gpfsug main discussion list ?mne: Re: [gpfsug-discuss] Strange timestamp behaviour on NFS via CES Does the cp actually complete? As in, does it copy all of the blocks? What?s the exit code? A cp?d file should have ?new? metadata. That is, it should have it?s own dates, owners, etc. (not necessarily copied from the source file). I ran ?strace cp foo1 foo2?, and it was pretty instructive, maybe that would get you more info. On CentOS strace is in it?s own package, YMMV. -- Stephen On Feb 3, 2017, at 8:19 AM, Andreas Mattsson > wrote: That works. ?touch test100? Feb 3 14:16 test100 ?cp test100 test101? Feb 3 14:16 test100 Apr 21 2027 test101 ?touch ?r test100 test101? Feb 3 14:16 test100 Feb 3 14:16 test101 /Andreas That?s a cool one. :) What if you use the "random date" file as a time reference to touch another file (like, 'touch -r file02 file03?)? -- Stephen On Feb 3, 2017, at 7:46 AM, Andreas Mattsson > wrote: I?m having some really strange timestamp behaviour when doing file operations on NFS mounts shared via CES on spectrum scale 4.2.1.1 The NFS clients are up to date Centos and Debian machines. All Scale servers and NFS clients have correct date and time via NTP. Creating a file, for instance ?touch file00?, gives correct timestamp. Moving the file, ?mv file00 file01?, gives correct timestamp Copying the file, ?cp file01 file02?, gives a random timestamp anywhere in time, for instance Oct 12 2095 or Feb 29 1976 or something similar. This is only via NFS. Copying the file via a native gpfs-mount or via SMB gives a correct timestamp. Doing the same operation over NFS to other NFS-servers works correct, it is only when operating on the NFS-share from the Spectrum Scale CES the issue occurs. Have anyone seen this before? Regards, Andreas Mattsson _____________________________________________ Andreas Mattsson Systems Engineer MAX IV Laboratory Lund University P.O. Box 118, SE-221 00 Lund, Sweden Visiting address: Fotongatan 2, 225 94 Lund Mobile: +46 706 64 95 44 andreas.mattsson at maxiv.se www.maxiv.se _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From taylorm at us.ibm.com Thu Sep 21 15:33:00 2017 From: taylorm at us.ibm.com (Michael L Taylor) Date: Thu, 21 Sep 2017 07:33:00 -0700 Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication In-Reply-To: References: Message-ID: Hi Jonathon, We were able to run this scenario successfully in our lab at the latest released 4.2.3.4. # /usr/lpp/mmfs/bin/mmdiag --version === mmdiag: version === Current GPFS build: "4.2.3.4 ". # /usr/lpp/mmfs/bin/mmces service list -a Enabled services: NFS node1.test.ibm.com: NFS is running # /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type userdefined File authentication configuration completed successfully. # rpm -qa | grep gpfs gpfs.ext-4.2.3-4.x86_64 gpfs.docs-4.2.3-4.noarch gpfs.gskit-8.0.50-75.x86_64 gpfs.gpl-4.2.3-4.noarch gpfs.msg.en_US-4.2.3-4.noarch nfs-ganesha-gpfs-2.3.2-0.ibm47.el7.x86_64 gpfs.base-4.2.3-4.x86_64 # rpm -qa | grep nfs-gan nfs-ganesha-utils-2.3.2-0.ibm47.el7.x86_64 nfs-ganesha-2.3.2-0.ibm47.el7.x86_64 nfs-ganesha-gpfs-2.3.2-0.ibm47.el7.x86_64 From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 09/20/2017 12:07 PM Subject: gpfsug-discuss Digest, Vol 68, Issue 42 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=BpVUgvFT2Qwgw0hveEgQaHFwn2mjeQjeBrkXHX_aC0A&m=2oGcWc1xx6zOclryoU2BdJykABuIR118zXTmSAA8msU&s=7q0JMYVHMSGlUAYquNMlrDRF6BDj6-76Oc4VbXrvlHE&e= or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: export nfs share on gpfs with no authentication (Jonathon A Anderson) ---------------------------------------------------------------------- Message: 1 Date: Wed, 20 Sep 2017 18:55:04 +0000 From: Jonathon A Anderson To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Message-ID: Content-Type: text/plain; charset="us-ascii" I shouldn't need SMB for authentication if I'm only using userdefined authentication, though. ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Sobey, Richard A Sent: Wednesday, September 20, 2017 2:23:37 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication This sounded familiar to a problem I had to do with SMB and NFS. I've looked, and it's a different problem, but at the time I had this response. "That would be the case when Active Directory is configured for authentication. In that case the SMB service includes two aspects: One is the actual SMB file server, and the second one is the service for the Active Directory integration. Since NFS depends on authentication and id mapping services, it requires SMB to be running." I suspect the last paragraph is relevant in your case. HTH Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jonathon A Anderson Sent: 20 September 2017 06:13 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Returning to this thread cause I'm having the same issue as Ilan, above. I'm working on setting up CES in our environment after finally getting a blocking bugfix applied. I'm making it further now, but I'm getting an error when I try to create my export: --- [root at sgate2 ~]# mmnfs export add /gpfs/summit/scratch --client 'login*.rc.int.colorado.edu(rw,root_squash);dtn*.rc.int.colorado.edu (rw,root_squash)' mmcesfuncs.sh: Current authentication: none is invalid. This operation can not be completed without correct Authentication configuration. Configure authentication using: mmuserauth mmnfs export add: Command failed. Examine previous error messages to determine cause. --- When I try to configure mmuserauth, I get an error about not having SMB active; but I don't want to configure SMB, only NFS. --- [root at sgate2 ~]# /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. --- How can I configure NFS exports with mmnfs without having to enable SMB? ~jonathon -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Sep 21 18:09:52 2017 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 21 Sep 2017 17:09:52 +0000 Subject: [gpfsug-discuss] CCR cluster down for the count? In-Reply-To: <20170920150739.39f0a4a0@osc.edu> References: <20170920114844.6bf9f27b@osc.edu> <28D10363-A8F3-439B-81DB-EB0E4E750FFD@vanderbilt.edu> <20170920150739.39f0a4a0@osc.edu> Message-ID: Hi All, Ralf Eberhard of IBM helped me resolve this off list. The key was to temporarily make testnsd1 and testnsd3 not be quorum nodes by making sure GPFS was down and then executing: mmchnode --nonquorum -N testnsd1,testnsd3 --force That gave me some scary messages about overriding normal GPFS quorum semantics, but nce that was done I was able to run an ?mmstartup -a? and bring up the cluster! Once it was up and I had verified things were working properly I then shut it back down so that I could rerun the mmchnode (without the ?force) to make testnsd1 and testnsd3 quorum nodes again. Thanks to all who helped me out here? Kevin On Sep 20, 2017, at 2:07 PM, Edward Wahl > wrote: So who was the ccrmaster before? What is/was the quorum config? (tiebreaker disks?) what does 'mmccr check' say? Have you set DEBUG=1 and tried mmstartup to see if it teases out any more info from the error? Ed On Wed, 20 Sep 2017 16:27:48 +0000 "Buterbaugh, Kevin L" > wrote: Hi Ed, Thanks for the suggestion ? that?s basically what I had done yesterday after Googling and getting a hit or two on the IBM DeveloperWorks site. I?m including some output below which seems to show that I?ve got everything set up but it?s still not working. Am I missing something? We don?t use CCR on our production cluster (and this experience doesn?t make me eager to do so!), so I?m not that familiar with it... Kevin /var/mmfs/gen root at testnsd2# mmdsh -F /tmp/cluster.hostnames "ps -ef | grep mmccr | grep -v grep" | sort testdellnode1: root 2583 1 0 May30 ? 00:10:33 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testdellnode1: root 6694 2583 0 11:19 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testgateway: root 2023 5828 0 11:19 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testgateway: root 5828 1 0 Sep18 ? 00:00:19 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd1: root 19356 4628 0 11:19 tty1 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd1: root 4628 1 0 Sep19 tty1 00:00:04 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd2: root 22149 2983 0 11:16 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd2: root 2983 1 0 Sep18 ? 00:00:27 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd3: root 15685 6557 0 11:19 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testnsd3: root 6557 1 0 Sep19 ? 00:00:04 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testsched: root 29424 6512 0 11:19 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 testsched: root 6512 1 0 Sep18 ? 00:00:20 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 /var/mmfs/gen root at testnsd2# mmstartup -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous error messages to determine cause. /var/mmfs/gen root at testnsd2# mmdsh -F /tmp/cluster.hostnames "ls -l /var/mmfs/ccr" | sort testdellnode1: drwxr-xr-x 2 root root 4096 Mar 3 2017 cached testdellnode1: drwxr-xr-x 2 root root 4096 Nov 10 2016 committed testdellnode1: -rw-r--r-- 1 root root 99 Nov 10 2016 ccr.nodes testdellnode1: total 12 testgateway: drwxr-xr-x. 2 root root 4096 Jun 29 2016 committed testgateway: drwxr-xr-x. 2 root root 4096 Mar 3 2017 cached testgateway: -rw-r--r--. 1 root root 99 Jun 29 2016 ccr.nodes testgateway: total 12 testnsd1: drwxr-xr-x 2 root root 6 Sep 19 15:38 cached testnsd1: drwxr-xr-x 2 root root 6 Sep 19 15:38 committed testnsd1: -rw-r--r-- 1 root root 0 Sep 19 15:39 ccr.disks testnsd1: -rw-r--r-- 1 root root 4 Sep 19 15:38 ccr.noauth testnsd1: -rw-r--r-- 1 root root 99 Sep 19 15:39 ccr.nodes testnsd1: total 8 testnsd2: drwxr-xr-x 2 root root 22 Mar 3 2017 cached testnsd2: drwxr-xr-x 2 root root 4096 Sep 18 11:49 committed testnsd2: -rw------- 1 root root 4096 Sep 18 11:50 ccr.paxos.1 testnsd2: -rw------- 1 root root 4096 Sep 18 11:50 ccr.paxos.2 testnsd2: -rw-r--r-- 1 root root 0 Jun 29 2016 ccr.disks testnsd2: -rw-r--r-- 1 root root 99 Jun 29 2016 ccr.nodes testnsd2: total 16 testnsd3: drwxr-xr-x 2 root root 6 Sep 19 15:41 cached testnsd3: drwxr-xr-x 2 root root 6 Sep 19 15:41 committed testnsd3: -rw-r--r-- 1 root root 0 Jun 29 2016 ccr.disks testnsd3: -rw-r--r-- 1 root root 4 Sep 19 15:41 ccr.noauth testnsd3: -rw-r--r-- 1 root root 99 Jun 29 2016 ccr.nodes testnsd3: total 8 testsched: drwxr-xr-x. 2 root root 4096 Jun 29 2016 committed testsched: drwxr-xr-x. 2 root root 4096 Mar 3 2017 cached testsched: -rw-r--r--. 1 root root 99 Jun 29 2016 ccr.nodes testsched: total 12 /var/mmfs/gen root at testnsd2# more ../ccr/ccr.nodes 3,0,10.0.6.215,,testnsd3.vampire 1,0,10.0.6.213,,testnsd1.vampire 2,0,10.0.6.214,,testnsd2.vampire /var/mmfs/gen root at testnsd2# mmdsh -F /tmp/cluster.hostnames "ls -l /var/mmfs/gen/mmsdrfs" testnsd1: -rw-r--r-- 1 root root 20360 Sep 19 15:21 /var/mmfs/gen/mmsdrfs testnsd3: -rw-r--r-- 1 root root 20360 Sep 19 15:34 /var/mmfs/gen/mmsdrfs testnsd2: -rw-r--r-- 1 root root 20360 Aug 25 17:34 /var/mmfs/gen/mmsdrfs testdellnode1: -rw-r--r-- 1 root root 20360 Aug 25 17:43 /var/mmfs/gen/mmsdrfs testgateway: -rw-r--r--. 1 root root 20360 Aug 25 17:43 /var/mmfs/gen/mmsdrfs testsched: -rw-r--r--. 1 root root 20360 Aug 25 17:43 /var/mmfs/gen/mmsdrfs /var/mmfs/gen root at testnsd2# mmdsh -F /tmp/cluster.hostnames "md5sum /var/mmfs/gen/mmsdrfs" testnsd1: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs testnsd3: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs testnsd2: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs testdellnode1: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs testgateway: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs testsched: 7120c79d9d767466c7629763abb7f730 /var/mmfs/gen/mmsdrfs /var/mmfs/gen root at testnsd2# mmdsh -F /tmp/cluster.hostnames "md5sum /var/mmfs/ssl/stage/genkeyData1" testnsd3: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 testnsd1: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 testnsd2: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 testdellnode1: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 testgateway: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 testsched: ee6d345a87202a9f9d613e4862c92811 /var/mmfs/ssl/stage/genkeyData1 /var/mmfs/gen root at testnsd2# On Sep 20, 2017, at 10:48 AM, Edward Wahl > wrote: I've run into this before. We didn't use to use CCR. And restoring nodes for us is a major pain in the rear as we only allow one-way root SSH, so we have a number of useful little scripts to work around problems like this. Assuming that you have all the necessary files copied to the correct places, you can manually kick off CCR. I think my script does something like: (copy the encryption key info) scp /var/mmfs/ccr/ccr.nodes :/var/mmfs/ccr/ scp /var/mmfs/gen/mmsdrfs :/var/mmfs/gen/ scp /var/mmfs/ssl/stage/genkeyData1 :/var/mmfs/ssl/stage/ :/usr/lpp/mmfs/bin/mmcommon startCcrMonitor you should then see like 2 copies of it running under mmksh. Ed On Wed, 20 Sep 2017 13:55:28 +0000 "Buterbaugh, Kevin L" > wrote: Hi All, testnsd1 and testnsd3 both had hardware issues (power supply and internal HD respectively). Given that they were 12 year old boxes, we decided to replace them with other boxes that are a mere 7 years old ? keep in mind that this is a test cluster. Disabling CCR does not work, even with the undocumented ??force? option: /var/mmfs/gen root at testnsd2# mmchcluster --ccr-disable -p testnsd2 -s testnsd1 --force mmchcluster: Unable to obtain the GPFS configuration file lock. mmchcluster: GPFS was unable to obtain a lock from node testnsd1.vampire. mmchcluster: Processing continues without lock protection. The authenticity of host 'testnsd3.vampire (10.0.6.215)' can't be established. ECDSA key fingerprint is SHA256:Ky1pkjsC/kvt4RA8PJuEh/W3vcxCJZplr2m1XHr+UwI. ECDSA key fingerprint is MD5:55:59:a0:2a:6e:a1:00:58:85:3d:ac:86:0e:cd:2a:8a. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'testnsd1.vampire (10.0.6.213)' can't be established. ECDSA key fingerprint is SHA256:WPiTtyuyzhuv+lRRpgDjLuHpyHyk/W3+c5N9SabWvnE. ECDSA key fingerprint is MD5:26:26:2a:bf:e4:cb:1d:a8:27:35:96:ef:b5:96:e0:29. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp609.vampire (10.0.21.9)' can't be established. ECDSA key fingerprint is SHA256:/gX6eSp/shsRboVFcUFcNCtGSfbBIWQZ/CWjA6gb17Q. ECDSA key fingerprint is MD5:ca:4d:58:8c:91:28:25:7b:5b:b1:0d:a3:72:a3:00:bb. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp608.vampire (10.0.21.8)' can't be established. ECDSA key fingerprint is SHA256:tvtNWN9b7/Qknb/Am8x7FzyMngi6R3f5SHBqATNtLzw. ECDSA key fingerprint is MD5:fc:4e:87:fb:09:82:cd:67:b0:7d:7f:c7:4b:83:b9:6c. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp612.vampire (10.0.21.12)' can't be established. ECDSA key fingerprint is SHA256:zKXqPt8rIMZWSAYavKEuaAVIm31OGVovoWVU+dBTRPM. ECDSA key fingerprint is MD5:72:4d:fb:22:4e:b3:0e:04:37:be:16:74:ae:ea:05:6c. Are you sure you want to continue connecting (yes/no)? root at vmp610.vampire's password: testnsd3.vampire: Host key verification failed. mmdsh: testnsd3.vampire remote shell process had return code 255. testnsd1.vampire: Host key verification failed. mmdsh: testnsd1.vampire remote shell process had return code 255. vmp609.vampire: Host key verification failed. mmdsh: vmp609.vampire remote shell process had return code 255. vmp608.vampire: Host key verification failed. mmdsh: vmp608.vampire remote shell process had return code 255. vmp612.vampire: Host key verification failed. mmdsh: vmp612.vampire remote shell process had return code 255. root at vmp610.vampire's password: vmp610.vampire: Permission denied, please try again. root at vmp610.vampire's password: vmp610.vampire: Permission denied, please try again. vmp610.vampire: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password). mmdsh: vmp610.vampire remote shell process had return code 255. Verifying GPFS is stopped on all nodes ... The authenticity of host 'testnsd3.vampire (10.0.6.215)' can't be established. ECDSA key fingerprint is SHA256:Ky1pkjsC/kvt4RA8PJuEh/W3vcxCJZplr2m1XHr+UwI. ECDSA key fingerprint is MD5:55:59:a0:2a:6e:a1:00:58:85:3d:ac:86:0e:cd:2a:8a. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp612.vampire (10.0.21.12)' can't be established. ECDSA key fingerprint is SHA256:zKXqPt8rIMZWSAYavKEuaAVIm31OGVovoWVU+dBTRPM. ECDSA key fingerprint is MD5:72:4d:fb:22:4e:b3:0e:04:37:be:16:74:ae:ea:05:6c. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp608.vampire (10.0.21.8)' can't be established. ECDSA key fingerprint is SHA256:tvtNWN9b7/Qknb/Am8x7FzyMngi6R3f5SHBqATNtLzw. ECDSA key fingerprint is MD5:fc:4e:87:fb:09:82:cd:67:b0:7d:7f:c7:4b:83:b9:6c. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'vmp609.vampire (10.0.21.9)' can't be established. ECDSA key fingerprint is SHA256:/gX6eSp/shsRboVFcUFcNCtGSfbBIWQZ/CWjA6gb17Q. ECDSA key fingerprint is MD5:ca:4d:58:8c:91:28:25:7b:5b:b1:0d:a3:72:a3:00:bb. Are you sure you want to continue connecting (yes/no)? The authenticity of host 'testnsd1.vampire (10.0.6.213)' can't be established. ECDSA key fingerprint is SHA256:WPiTtyuyzhuv+lRRpgDjLuHpyHyk/W3+c5N9SabWvnE. ECDSA key fingerprint is MD5:26:26:2a:bf:e4:cb:1d:a8:27:35:96:ef:b5:96:e0:29. Are you sure you want to continue connecting (yes/no)? root at vmp610.vampire's password: root at vmp610.vampire's password: root at vmp610.vampire's password: testnsd3.vampire: Host key verification failed. mmdsh: testnsd3.vampire remote shell process had return code 255. vmp612.vampire: Host key verification failed. mmdsh: vmp612.vampire remote shell process had return code 255. vmp608.vampire: Host key verification failed. mmdsh: vmp608.vampire remote shell process had return code 255. vmp609.vampire: Host key verification failed. mmdsh: vmp609.vampire remote shell process had return code 255. testnsd1.vampire: Host key verification failed. mmdsh: testnsd1.vampire remote shell process had return code 255. vmp610.vampire: Permission denied, please try again. vmp610.vampire: Permission denied, please try again. vmp610.vampire: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password). mmdsh: vmp610.vampire remote shell process had return code 255. mmchcluster: Command failed. Examine previous error messages to determine cause. /var/mmfs/gen root at testnsd2# I believe that part of the problem may be that there are 4 client nodes that were removed from the cluster without removing them from the cluster (done by another SysAdmin who was in a hurry to repurpose those machines). They?re up and pingable but not reachable by GPFS anymore, which I?m pretty sure is making things worse. Nor does Loic?s suggestion of running mmcommon work (but thanks for the suggestion!) ? actually the mmcommon part worked, but a subsequent attempt to start the cluster up failed: /var/mmfs/gen root at testnsd2# mmstartup -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous error messages to determine cause. /var/mmfs/gen root at testnsd2# Thanks. Kevin On Sep 19, 2017, at 10:07 PM, IBM Spectrum Scale > wrote: Hi Kevin, Let's me try to understand the problem you have. What's the meaning of node died here. Are you mean that there are some hardware/OS issue which cannot be fixed and OS cannot be up anymore? I agree with Bob that you can have a try to disable CCR temporally, restore cluster configuration and enable it again. Such as: 1. Login to a node which has proper GPFS config, e.g NodeA 2. Shutdown daemon in all client cluster. 3. mmchcluster --ccr-disable -p NodeA 4. mmsdrrestore -a -p NodeA 5. mmauth genkey propagate -N testnsd1, testnsd3 6. mmchcluster --ccr-enable Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ibm.com%2Fdeveloperworks%2Fcommunity%2Fforums%2Fhtml%2Fforum%3Fid%3D11111111-0000-0000-0000-000000000479&data=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C745cfeaac7264124bb8c08d5003f162a%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636415193316350738&sdata=8OL9COHsb4M%2BZOyWta92acdO8K1Ez8HJfHbrCdDsmRs%3D&reserved=0. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. "Oesterlin, Robert" ---09/20/2017 07:39:55 AM---OK ? I?ve run across this before, and it?s because of a bug (as I recall) having to do with CCR and From: "Oesterlin, Robert" > To: gpfsug main discussion list > Date: 09/20/2017 07:39 AM Subject: Re: [gpfsug-discuss] CCR cluster down for the count? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ OK ? I?ve run across this before, and it?s because of a bug (as I recall) having to do with CCR and quorum. What I think you can do is set the cluster to non-ccr (mmchcluster ?ccr-disable) with all the nodes down, bring it back up and then re-enable ccr. I?ll see if I can find this in one of the recent 4.2 release nodes. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: > on behalf of "Buterbaugh, Kevin L" > Reply-To: gpfsug main discussion list > Date: Tuesday, September 19, 2017 at 4:03 PM To: gpfsug main discussion list > Subject: [EXTERNAL] [gpfsug-discuss] CCR cluster down for the count? Hi All, We have a small test cluster that is CCR enabled. It only had/has 3 NSD servers (testnsd1, 2, and 3) and maybe 3-6 clients. testnsd3 died a while back. I did nothing about it at the time because it was due to be life-cycled as soon as I finished a couple of higher priority projects. Yesterday, testnsd1 also died, which took the whole cluster down. So now resolving this has become higher priority? ;-) I took two other boxes and set them up as testnsd1 and 3, respectively. I?ve done a ?mmsdrrestore -p testnsd2 -R /usr/bin/scp? on both of them. I?ve also done a "mmccr setup -F? and copied the ccr.disks and ccr.nodes files from testnsd2 to them. And I?ve copied /var/mmfs/gen/mmsdrfs from testnsd2 to testnsd1 and 3. In case it?s not obvious from the above, networking is fine ? ssh without a password between those 3 boxes is fine. However, when I try to startup GPFS ? or run any GPFS command I get: /root root at testnsd2# mmstartup -a get file failed: Not enough CCR quorum nodes available (err 809) gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158 mmstartup: Command failed. Examine previous error messages to determine cause. /root root at testnsd2# I?ve got to run to a meeting right now, so I hope I?m not leaving out any crucial details here ? does anyone have an idea what I need to do? Thanks? ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttp-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss%26d%3DDwICAg%26c%3Djf_iaSHvJObTbx-siA1ZOg%26r%3DIbxtjdkPAM2Sbon4Lbbi4w%26m%3DmBSa534LB4C2zN59ZsJSlginQqfcrutinpAPYNDqU_Y%26s%3DYJEapknqzE2d9kwZzZuu6gEW0DzBoM-o94pXGEeCfuI%26e&data=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C745cfeaac7264124bb8c08d5003f162a%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636415193316350738&sdata=oQ4u%2BdyyYLY7HzaOqRPEGjUVhi7AQF%2BvbvnWA4bhuXE%3D&reserved=0= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C494f0469ec084568b39608d4ffd4b8c2%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636414736486816768&sdata=kBvEL7Kp2JMGuLIL4NX3UV7h3emaayQSbHr8O1F2CXc%3D&reserved=0 -- Ed Wahl Ohio Supercomputer Center 614-292-9302 ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -- Ed Wahl Ohio Supercomputer Center 614-292-9302 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cfabfdb4659d249e2d20308d5005ae1ab%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636415312700069585&sdata=Z59ik0w%2BaK6bV2JsDxSNt%2FsqwR1ESuqkXTQVBlRjDgw%3D&reserved=0 ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Thu Sep 21 19:49:29 2017 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Thu, 21 Sep 2017 11:49:29 -0700 Subject: [gpfsug-discuss] User Meeting & SPXXL in NYC In-Reply-To: References: Message-ID: Registration space is getting tight. We decided on a room reconfiguration today to make a little more room. So if you tried to register and were told it was full try again. If it fills up again and you want to register, but can?t drop me an email and I?ll see what we can do. Best, Kristy > On Sep 20, 2017, at 9:00 AM, Kristy Kallback-Rose wrote: > > Thanks Doug. > > If you plan to go, *do register*. GPFS Day is free, but we need to know how many will attend. Register using the link on the HPCXXL event page below. > > Cheers, > Kristy > >> On Sep 20, 2017, at 1:28 AM, Douglas O'flaherty > wrote: >> >> >> Reminder that the SPXXL day on IBM Spectrum Scale in New York is open to all. It is Thursday the 28th. There is also a Power day on Wednesday. >> >> >> For more information >> http://hpcxxl.org/summer-2017-meeting-september-24-29-new-york-city/ >> >> Doug >> >> Mobile >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From christof.schmitt at us.ibm.com Fri Sep 22 23:08:58 2017 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Fri, 22 Sep 2017 22:08:58 +0000 Subject: [gpfsug-discuss] Strange timestamp behaviour on NFS via CES In-Reply-To: <10541a8ed07149ecafdbe9ac03b807b8@maxiv.lu.se> References: <10541a8ed07149ecafdbe9ac03b807b8@maxiv.lu.se>, , Message-ID: An HTML attachment was scrubbed... URL: From christof.schmitt at us.ibm.com Fri Sep 22 23:10:45 2017 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Fri, 22 Sep 2017 22:10:45 +0000 Subject: [gpfsug-discuss] Strange timestamp behaviour on NFS via CES In-Reply-To: References: , <10541a8ed07149ecafdbe9ac03b807b8@maxiv.lu.se>, , Message-ID: An HTML attachment was scrubbed... URL: From bipcuds at gmail.com Sun Sep 24 19:04:59 2017 From: bipcuds at gmail.com (Keith Ball) Date: Sun, 24 Sep 2017 14:04:59 -0400 Subject: [gpfsug-discuss] Experience with zimon database stability, and best practices for backup? Message-ID: Hello All, In a recent Spectrum Scale performance study, we used zimon/mmperfmon to gather metrics. During a period of 2 months, we ended up losing data twice from the zimon database; once after the virtual disk serving both the OS files and zimon collector and DB storage was resized, and a second time after an unknown event (the loss was discovered when plotting in Grafana only went back to a certain data and time; likewise, mmperfmon query output only went back to the same time). Details: - Spectrum Scale 4.2.1.1 (on NSD servers); 4.2.1.2 on the zimon collector node and other clients - Data retention in the "raw" stratum was set to 2 months; the "domains" settings were as follows (note that we did not hit the ceiling of 60GB (1GB/file * 60 files): domains = { # this is the raw domain aggregation = 0 # aggregation factor for the raw domain is always 0. ram = "12g" # amount of RAM to be used duration = "2m" # amount of time that data with the highest precision is kept. filesize = "1g" # maximum file size files = 60 # number of files. }, { # this is the first aggregation domain that aggregates to 10 seconds aggregation = 10 ram = "800m" # amount of RAM to be used duration = "6m" # keep aggregates for 1 week. filesize = "1g" # maximum file size files = 10 # number of files. }, { # this is the second aggregation domain that aggregates to 30*10 seconds == 5 minutes aggregation = 30 ram = "800m" # amount of RAM to be used duration = "1y" # keep averages for 2 months. filesize = "1g" # maximum file size files = 5 # number of files. }, { # this is the third aggregation domain that aggregates to 24*30*10 seconds == 2 hours aggregation = 24 ram = "800m" # amount of RAM to be used duration = "2y" # filesize = "1g" # maximum file size files = 5 # number of files. } Questions: 1.) Has anyone had similar issues with losing data from zimon? 2.) Are there known circumstances where data could be lost, e.g. changing the aggregation domain definitions, or even simply restarting the zimon collector? 3.) Does anyone have any "best practices" for backing up the zimon database? We were taking weekly "snapshots" by shutting down the collector, and making a tarball copy of the /opt/ibm/zimon directory (but the database corruption/data loss still crept through for various reasons). In terms of debugging, we do not have Scale or zimon logs going back to the suspected dates of data loss; we do have a gpfs.snap from about a month after the last data loss - would it have any useful clues? Opening a PMR could be tricky, as it was the customer who has the support entitlement, and the environment (specifically the old cluster definitino and the zimon collector VM) was torn down. Many Thanks, Keith -- Keith D. Ball, PhD RedLine Performance Solutions, LLC web: http://www.redlineperf.com/ email: kball at redlineperf.com cell: 540-557-7851 <%28540%29%20557-7851> -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Sun Sep 24 20:29:10 2017 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Sun, 24 Sep 2017 12:29:10 -0700 Subject: [gpfsug-discuss] Experience with zimon database stability, and best practices for backup? In-Reply-To: References: Message-ID: Hi Keith, We have barely begun with Zimon and have not (knock, knock) run up against any loss or corruption issues with Zimon. However, getting data out of Zimon for various reasons is something I have been thinking about. I'm interested partly because of the granularity that is lost over time like with any round robin style data collection scheme. So I guess one question is whether you have considered pulling the data out to another database, looked at the SS GUI which uses a postgres db (iirc, about to take off on a flight and can't check), or looked at the Grafana bridge which would get data into OpenTsdb format, again iirc. Anyway, just some things for consideration and a request to share back whatever you find out if it's off list. Thanks, getting stink eye to go to airplane mode. More later. Cheers Kristy On Sep 24, 2017 11:05 AM, "Keith Ball" wrote: Hello All, In a recent Spectrum Scale performance study, we used zimon/mmperfmon to gather metrics. During a period of 2 months, we ended up losing data twice from the zimon database; once after the virtual disk serving both the OS files and zimon collector and DB storage was resized, and a second time after an unknown event (the loss was discovered when plotting in Grafana only went back to a certain data and time; likewise, mmperfmon query output only went back to the same time). Details: - Spectrum Scale 4.2.1.1 (on NSD servers); 4.2.1.2 on the zimon collector node and other clients - Data retention in the "raw" stratum was set to 2 months; the "domains" settings were as follows (note that we did not hit the ceiling of 60GB (1GB/file * 60 files): domains = { # this is the raw domain aggregation = 0 # aggregation factor for the raw domain is always 0. ram = "12g" # amount of RAM to be used duration = "2m" # amount of time that data with the highest precision is kept. filesize = "1g" # maximum file size files = 60 # number of files. }, { # this is the first aggregation domain that aggregates to 10 seconds aggregation = 10 ram = "800m" # amount of RAM to be used duration = "6m" # keep aggregates for 1 week. filesize = "1g" # maximum file size files = 10 # number of files. }, { # this is the second aggregation domain that aggregates to 30*10 seconds == 5 minutes aggregation = 30 ram = "800m" # amount of RAM to be used duration = "1y" # keep averages for 2 months. filesize = "1g" # maximum file size files = 5 # number of files. }, { # this is the third aggregation domain that aggregates to 24*30*10 seconds == 2 hours aggregation = 24 ram = "800m" # amount of RAM to be used duration = "2y" # filesize = "1g" # maximum file size files = 5 # number of files. } Questions: 1.) Has anyone had similar issues with losing data from zimon? 2.) Are there known circumstances where data could be lost, e.g. changing the aggregation domain definitions, or even simply restarting the zimon collector? 3.) Does anyone have any "best practices" for backing up the zimon database? We were taking weekly "snapshots" by shutting down the collector, and making a tarball copy of the /opt/ibm/zimon directory (but the database corruption/data loss still crept through for various reasons). In terms of debugging, we do not have Scale or zimon logs going back to the suspected dates of data loss; we do have a gpfs.snap from about a month after the last data loss - would it have any useful clues? Opening a PMR could be tricky, as it was the customer who has the support entitlement, and the environment (specifically the old cluster definitino and the zimon collector VM) was torn down. Many Thanks, Keith -- Keith D. Ball, PhD RedLine Performance Solutions, LLC web: http://www.redlineperf.com/ email: kball at redlineperf.com cell: 540-557-7851 <%28540%29%20557-7851> _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From rkomandu at in.ibm.com Mon Sep 25 06:26:15 2017 From: rkomandu at in.ibm.com (Ravi K Komanduri) Date: Mon, 25 Sep 2017 10:56:15 +0530 Subject: [gpfsug-discuss] export nfs share on gpfs with no authentication In-Reply-To: References: Message-ID: Jonathon, This requires SMB service when you are at 422 PTF2. As Mike pointed out if you upgrade to the 4.2.3-3/4 build you will no longer hit that issue With Regards, Ravi K Komanduri Email:rkomandu at in.ibm.com From: "Michael L Taylor" To: gpfsug-discuss at spectrumscale.org Date: 09/21/2017 08:03 PM Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Jonathon, We were able to run this scenario successfully in our lab at the latest released 4.2.3.4. # /usr/lpp/mmfs/bin/mmdiag --version === mmdiag: version === Current GPFS build: "4.2.3.4 ". # /usr/lpp/mmfs/bin/mmces service list -a Enabled services: NFS node1.test.ibm.com: NFS is running # /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type userdefined File authentication configuration completed successfully. # rpm -qa | grep gpfs gpfs.ext-4.2.3-4.x86_64 gpfs.docs-4.2.3-4.noarch gpfs.gskit-8.0.50-75.x86_64 gpfs.gpl-4.2.3-4.noarch gpfs.msg.en_US-4.2.3-4.noarch nfs-ganesha-gpfs-2.3.2-0.ibm47.el7.x86_64 gpfs.base-4.2.3-4.x86_64 # rpm -qa | grep nfs-gan nfs-ganesha-utils-2.3.2-0.ibm47.el7.x86_64 nfs-ganesha-2.3.2-0.ibm47.el7.x86_64 nfs-ganesha-gpfs-2.3.2-0.ibm47.el7.x86_64 From: gpfsug-discuss-request at spectrumscale.org To: gpfsug-discuss at spectrumscale.org Date: 09/20/2017 12:07 PM Subject: gpfsug-discuss Digest, Vol 68, Issue 42 Sent by: gpfsug-discuss-bounces at spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=BpVUgvFT2Qwgw0hveEgQaHFwn2mjeQjeBrkXHX_aC0A&m=2oGcWc1xx6zOclryoU2BdJykABuIR118zXTmSAA8msU&s=7q0JMYVHMSGlUAYquNMlrDRF6BDj6-76Oc4VbXrvlHE&e= or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: export nfs share on gpfs with no authentication (Jonathon A Anderson) ---------------------------------------------------------------------- Message: 1 Date: Wed, 20 Sep 2017 18:55:04 +0000 From: Jonathon A Anderson To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Message-ID: Content-Type: text/plain; charset="us-ascii" I shouldn't need SMB for authentication if I'm only using userdefined authentication, though. ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Sobey, Richard A Sent: Wednesday, September 20, 2017 2:23:37 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication This sounded familiar to a problem I had to do with SMB and NFS. I've looked, and it's a different problem, but at the time I had this response. "That would be the case when Active Directory is configured for authentication. In that case the SMB service includes two aspects: One is the actual SMB file server, and the second one is the service for the Active Directory integration. Since NFS depends on authentication and id mapping services, it requires SMB to be running." I suspect the last paragraph is relevant in your case. HTH Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jonathon A Anderson Sent: 20 September 2017 06:13 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] export nfs share on gpfs with no authentication Returning to this thread cause I'm having the same issue as Ilan, above. I'm working on setting up CES in our environment after finally getting a blocking bugfix applied. I'm making it further now, but I'm getting an error when I try to create my export: --- [root at sgate2 ~]# mmnfs export add /gpfs/summit/scratch --client 'login*.rc.int.colorado.edu(rw,root_squash);dtn*.rc.int.colorado.edu(rw,root_squash)' mmcesfuncs.sh: Current authentication: none is invalid. This operation can not be completed without correct Authentication configuration. Configure authentication using: mmuserauth mmnfs export add: Command failed. Examine previous error messages to determine cause. --- When I try to configure mmuserauth, I get an error about not having SMB active; but I don't want to configure SMB, only NFS. --- [root at sgate2 ~]# /usr/lpp/mmfs/bin/mmuserauth service create --data-access-method file --type userdefined : SMB service not enabled. Enable SMB service first. mmcesuserauthcrservice: Command failed. Examine previous error messages to determine cause. --- How can I configure NFS exports with mmnfs without having to enable SMB? ~jonathon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=ilYETqcaNr1y1ulWWDPjVg_X9pt35O1eYBTyFwJP56Y&m=VW8gJLSqT4rru6lFZXxCFp-Y3ngi6IUydv5czoG8kTE&s=deIQZQr-qfqLqW377yNysTJI8y7QJOdbokVjlnDr2d8&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.hearns at asml.com Mon Sep 25 08:40:34 2017 From: john.hearns at asml.com (John Hearns) Date: Mon, 25 Sep 2017 07:40:34 +0000 Subject: [gpfsug-discuss] SPectrum Scale on AWS Message-ID: I guess this is not news on this list, however I did see a reference to SpectrumScale on The Register this morning, which linked to this paper: https://s3.amazonaws.com/quickstart-reference/ibm/spectrum/scale/latest/doc/ibm-spectrum-scale-on-the-aws-cloud.pdf The article is here https://www.theregister.co.uk/2017/09/25/storage_super_club_sandwich/ 12 Terabyte Helium drives now available. -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikeowen at thinkboxsoftware.com Mon Sep 25 10:26:21 2017 From: mikeowen at thinkboxsoftware.com (Mike Owen) Date: Mon, 25 Sep 2017 10:26:21 +0100 Subject: [gpfsug-discuss] SPectrum Scale on AWS In-Reply-To: References: Message-ID: Full PR release below: https://aws.amazon.com/about-aws/whats-new/2017/09/deploy-ibm-spectrum-scale-on-the-aws-cloud-with-new-quick-start/ Posted On: Sep 13, 2017 This new Quick Start automatically deploys a highly available IBM Spectrum Scale cluster with replication on the Amazon Web Services (AWS) Cloud, into a configuration of your choice. (A small cluster can be deployed in about 25 minutes.) IBM Spectrum Scale is a flexible, software-defined storage solution that can be deployed as highly available, high-performance file storage. It can scale in several dimensions, including performance (bandwidth and IOPS), capacity, and number of nodes that can mount the file system. The product?s high performance and scalability helps address the needs of applications whose performance (or performance-to-capacity ratio) demands cannot be met by traditional scale-up storage systems. The IBM Spectrum Scale software is being made available through a 90-day trial license evaluation program. This Quick Start automates the deployment of IBM Spectrum Scale on AWS for users who require highly available access to a shared name space across multiple instances with good performance, without requiring an in-depth knowledge of IBM Spectrum Scale. The Quick Start deploys IBM Network Shared Disk (NSD) storage server instances and IBM Spectrum Scale compute instances into a virtual private cloud (VPC) in your AWS account. Data and metadata elements are replicated across two Availability Zones for optimal data protection. You can build a new VPC for IBM Spectrum Scale, or deploy the software into your existing VPC. The automated deployment provisions the IBM Spectrum Scale instances in Auto Scaling groups for instance scaling and management. The deployment and configuration tasks are automated by AWS CloudFormation templates that you can customize during launch. You can also use the templates as a starting point for your own implementation, by downloading them from the GitHub repository . The Quick Start includes a guide with step-by-step deployment and configuration instructions. To get started with IBM Spectrum Scale on AWS, use the following resources: - View the architecture and details - View the deployment guide - Browse and launch other AWS Quick Start reference deployments On 25 September 2017 at 08:40, John Hearns wrote: > I guess this is not news on this list, however I did see a reference to > SpectrumScale on The Register this morning, > > which linked to this paper: > > https://s3.amazonaws.com/quickstart-reference/ibm/ > spectrum/scale/latest/doc/ibm-spectrum-scale-on-the-aws-cloud.pdf > > > > The article is here https://www.theregister.co.uk/ > 2017/09/25/storage_super_club_sandwich/ > > 12 Terabyte Helium drives now available. > > > > > -- The information contained in this communication and any attachments is > confidential and may be privileged, and is for the sole use of the intended > recipient(s). Any unauthorized review, use, disclosure or distribution is > prohibited. Unless explicitly stated otherwise in the body of this > communication or the attachment thereto (if any), the information is > provided on an AS-IS basis without any express or implied warranties or > liabilities. To the extent you are relying on this information, you are > doing so at your own risk. If you are not the intended recipient, please > notify the sender immediately by replying to this message and destroy all > copies of this message and any attachments. Neither the sender nor the > company/group of companies he or she represents shall be liable for the > proper and complete transmission of the information contained in this > communication, or for any delay in its receipt. > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Mon Sep 25 12:42:15 2017 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 25 Sep 2017 11:42:15 +0000 Subject: [gpfsug-discuss] Experience with zimon database stability, and best practices for backup? Message-ID: <018DE6B7-ADE3-4A01-B23C-9DB668FD95DB@nuance.com> Another data point for Keith/Kristy, I?ve been using Zimon for about 18 months now, and I?ll have to admit it?s been less than robust for long-term data. The biggest issue I?ve run into is the stability of the collector process. I have it crash on a fairly regular basis, most due to memory usage. This results in data loss You can configure it in a highly-available mode that should mitigate this to some degree. However, I don?t think IBM has published any details on how reliable the data collection process is. Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Kristy Kallback-Rose Reply-To: gpfsug main discussion list Date: Sunday, September 24, 2017 at 2:29 PM To: gpfsug main discussion list Subject: [EXTERNAL] Re: [gpfsug-discuss] Experience with zimon database stability, and best practices for backup? Hi Keith, We have barely begun with Zimon and have not (knock, knock) run up against any loss or corruption issues with Zimon. However, getting data out of Zimon for various reasons is something I have been thinking about. I'm interested partly because of the granularity that is lost over time like with any round robin style data collection scheme. So I guess one question is whether you have considered pulling the data out to another database, looked at the SS GUI which uses a postgres db (iirc, about to take off on a flight and can't check), or looked at the Grafana bridge which would get data into OpenTsdb format, again iirc. Anyway, just some things for consideration and a request to share back whatever you find out if it's off list. Thanks, getting stink eye to go to airplane mode. More later. Cheers Kristy On Sep 24, 2017 11:05 AM, "Keith Ball" > wrote: Hello All, In a recent Spectrum Scale performance study, we used zimon/mmperfmon to gather metrics. During a period of 2 months, we ended up losing data twice from the zimon database; once after the virtual disk serving both the OS files and zimon collector and DB storage was resized, and a second time after an unknown event (the loss was discovered when plotting in Grafana only went back to a certain data and time; likewise, mmperfmon query output only went back to the same time). Details: - Spectrum Scale 4.2.1.1 (on NSD servers); 4.2.1.2 on the zimon collector node and other clients - Data retention in the "raw" stratum was set to 2 months; the "domains" settings were as follows (note that we did not hit the ceiling of 60GB (1GB/file * 60 files): domains = { # this is the raw domain aggregation = 0 # aggregation factor for the raw domain is always 0. ram = "12g" # amount of RAM to be used duration = "2m" # amount of time that data with the highest precision is kept. filesize = "1g" # maximum file size files = 60 # number of files. }, { # this is the first aggregation domain that aggregates to 10 seconds aggregation = 10 ram = "800m" # amount of RAM to be used duration = "6m" # keep aggregates for 1 week. filesize = "1g" # maximum file size files = 10 # number of files. }, { # this is the second aggregation domain that aggregates to 30*10 seconds == 5 minutes aggregation = 30 ram = "800m" # amount of RAM to be used duration = "1y" # keep averages for 2 months. filesize = "1g" # maximum file size files = 5 # number of files. }, { # this is the third aggregation domain that aggregates to 24*30*10 seconds == 2 hours aggregation = 24 ram = "800m" # amount of RAM to be used duration = "2y" # filesize = "1g" # maximum file size files = 5 # number of files. } Questions: 1.) Has anyone had similar issues with losing data from zimon? 2.) Are there known circumstances where data could be lost, e.g. changing the aggregation domain definitions, or even simply restarting the zimon collector? 3.) Does anyone have any "best practices" for backing up the zimon database? We were taking weekly "snapshots" by shutting down the collector, and making a tarball copy of the /opt/ibm/zimon directory (but the database corruption/data loss still crept through for various reasons). In terms of debugging, we do not have Scale or zimon logs going back to the suspected dates of data loss; we do have a gpfs.snap from about a month after the last data loss - would it have any useful clues? Opening a PMR could be tricky, as it was the customer who has the support entitlement, and the environment (specifically the old cluster definitino and the zimon collector VM) was torn down. Many Thanks, Keith -- Keith D. Ball, PhD RedLine Performance Solutions, LLC web: http://www.redlineperf.com/ email: kball at redlineperf.com cell: 540-557-7851 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Mon Sep 25 15:35:33 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Mon, 25 Sep 2017 14:35:33 +0000 Subject: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? Message-ID: <1506350132.352.17.camel@imperial.ac.uk> Hi all, This used to work (removing a SID from an ACL), but doesn't any more. Looks like a bug unless I'm being stupid. [root at cesnode~]# mmsmb exportacl list neuroscience2 --viewsids [neuroscience2] REVISION:1 ACL:S-1-1-0:ALLOWED/FULL [root at cesnode ~] mmsmb exportacl remove neuroscience2 --SID S-1-1-0 mmsmb exportacl remove: Incorrect option: --sid Usage: mmsmb exportacl remove ExportName {Name | --user UserName | --group GroupName | --system SystemName | --SID SID} [--access Access] [--permissions Permissions] [--viewsddl] [--viewsids] [-h|--help] where: Access is one of ALLOWED, DENIED Permissions is one of FULL, CHANGE, READ or any combination of RWXDPO I've tried lower case SID i.e --sid, and specifying --access ALLOWED and --permissions FULL. Omitting the --SID argument entirely simply results in GPFS telling me I must specify a Name or an SID. [root at cesnode~]# mmsmb exportacl remove neuroscience2 --access ALLOWED --permissions FULL [E] The mmsmb exportacl remove command requires a Name or SID. Can anyone see my mistake? Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From christof.schmitt at us.ibm.com Mon Sep 25 22:41:11 2017 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Mon, 25 Sep 2017 21:41:11 +0000 Subject: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? In-Reply-To: <1506350132.352.17.camel@imperial.ac.uk> References: <1506350132.352.17.camel@imperial.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: From christof.schmitt at us.ibm.com Mon Sep 25 22:41:11 2017 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Mon, 25 Sep 2017 21:41:11 +0000 Subject: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? In-Reply-To: <1506350132.352.17.camel@imperial.ac.uk> References: <1506350132.352.17.camel@imperial.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Tue Sep 26 09:22:05 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Tue, 26 Sep 2017 08:22:05 +0000 Subject: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? In-Reply-To: References: <1506350132.352.17.camel@imperial.ac.uk> Message-ID: Hi Christof, thanks I?ll try it on a test cluster. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christof Schmitt Sent: 25 September 2017 22:41 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? 4.2.3 PTF4 seems to have a fix for this area. Can you try again with that PTF installed? Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) ----- Original message ----- From: "Sobey, Richard A" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at gpfsug.org" > Cc: Subject: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? Date: Mon, Sep 25, 2017 7:35 AM Hi all, This used to work (removing a SID from an ACL), but doesn't any more. Looks like a bug unless I'm being stupid. [root at cesnode~]# mmsmb exportacl list neuroscience2 --viewsids [neuroscience2] REVISION:1 ACL:S-1-1-0:ALLOWED/FULL [root at cesnode ~] mmsmb exportacl remove neuroscience2 --SID S-1-1-0 mmsmb exportacl remove: Incorrect option: --sid Usage: mmsmb exportacl remove ExportName {Name | --user UserName | --group GroupName | --system SystemName | --SID SID} [--access Access] [--permissions Permissions] [--viewsddl] [--viewsids] [-h|--help] where: Access is one of ALLOWED, DENIED Permissions is one of FULL, CHANGE, READ or any combination of RWXDPO I've tried lower case SID i.e --sid, and specifying --access ALLOWED and --permissions FULL. Omitting the --SID argument entirely simply results in GPFS telling me I must specify a Name or an SID. [root at cesnode~]# mmsmb exportacl remove neuroscience2 --access ALLOWED --permissions FULL [E] The mmsmb exportacl remove command requires a Name or SID. Can anyone see my mistake? Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=5Nn7eUPeYe291x8f39jKybESLKv_W_XtkTkS8fTR-NI&m=c3uTUNFbPTWWcTNUNVVejlQ0xdnhBAfQdouTBlVgnjc&s=hsWeRhH-BhTEaAlrTPbJGlwCV-5Ui7t03Zcec9kywOA&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Tue Sep 26 09:22:05 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Tue, 26 Sep 2017 08:22:05 +0000 Subject: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? In-Reply-To: References: <1506350132.352.17.camel@imperial.ac.uk> Message-ID: Hi Christof, thanks I?ll try it on a test cluster. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christof Schmitt Sent: 25 September 2017 22:41 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? 4.2.3 PTF4 seems to have a fix for this area. Can you try again with that PTF installed? Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) ----- Original message ----- From: "Sobey, Richard A" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at gpfsug.org" > Cc: Subject: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? Date: Mon, Sep 25, 2017 7:35 AM Hi all, This used to work (removing a SID from an ACL), but doesn't any more. Looks like a bug unless I'm being stupid. [root at cesnode~]# mmsmb exportacl list neuroscience2 --viewsids [neuroscience2] REVISION:1 ACL:S-1-1-0:ALLOWED/FULL [root at cesnode ~] mmsmb exportacl remove neuroscience2 --SID S-1-1-0 mmsmb exportacl remove: Incorrect option: --sid Usage: mmsmb exportacl remove ExportName {Name | --user UserName | --group GroupName | --system SystemName | --SID SID} [--access Access] [--permissions Permissions] [--viewsddl] [--viewsids] [-h|--help] where: Access is one of ALLOWED, DENIED Permissions is one of FULL, CHANGE, READ or any combination of RWXDPO I've tried lower case SID i.e --sid, and specifying --access ALLOWED and --permissions FULL. Omitting the --SID argument entirely simply results in GPFS telling me I must specify a Name or an SID. [root at cesnode~]# mmsmb exportacl remove neuroscience2 --access ALLOWED --permissions FULL [E] The mmsmb exportacl remove command requires a Name or SID. Can anyone see my mistake? Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=5Nn7eUPeYe291x8f39jKybESLKv_W_XtkTkS8fTR-NI&m=c3uTUNFbPTWWcTNUNVVejlQ0xdnhBAfQdouTBlVgnjc&s=hsWeRhH-BhTEaAlrTPbJGlwCV-5Ui7t03Zcec9kywOA&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Tue Sep 26 10:59:13 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Tue, 26 Sep 2017 09:59:13 +0000 Subject: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? In-Reply-To: References: <1506350132.352.17.camel@imperial.ac.uk> Message-ID: There isn?t a default ACL being applied to the export at all now, which is fine, but it differs from the behaviour in 4.2.3 PTF2. Thanks Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 26 September 2017 09:22 To: gpfsug main discussion list Cc: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? Hi Christof, thanks I?ll try it on a test cluster. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christof Schmitt Sent: 25 September 2017 22:41 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? 4.2.3 PTF4 seems to have a fix for this area. Can you try again with that PTF installed? Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) ----- Original message ----- From: "Sobey, Richard A" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at gpfsug.org" > Cc: Subject: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? Date: Mon, Sep 25, 2017 7:35 AM Hi all, This used to work (removing a SID from an ACL), but doesn't any more. Looks like a bug unless I'm being stupid. [root at cesnode~]# mmsmb exportacl list neuroscience2 --viewsids [neuroscience2] REVISION:1 ACL:S-1-1-0:ALLOWED/FULL [root at cesnode ~] mmsmb exportacl remove neuroscience2 --SID S-1-1-0 mmsmb exportacl remove: Incorrect option: --sid Usage: mmsmb exportacl remove ExportName {Name | --user UserName | --group GroupName | --system SystemName | --SID SID} [--access Access] [--permissions Permissions] [--viewsddl] [--viewsids] [-h|--help] where: Access is one of ALLOWED, DENIED Permissions is one of FULL, CHANGE, READ or any combination of RWXDPO I've tried lower case SID i.e --sid, and specifying --access ALLOWED and --permissions FULL. Omitting the --SID argument entirely simply results in GPFS telling me I must specify a Name or an SID. [root at cesnode~]# mmsmb exportacl remove neuroscience2 --access ALLOWED --permissions FULL [E] The mmsmb exportacl remove command requires a Name or SID. Can anyone see my mistake? Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=5Nn7eUPeYe291x8f39jKybESLKv_W_XtkTkS8fTR-NI&m=c3uTUNFbPTWWcTNUNVVejlQ0xdnhBAfQdouTBlVgnjc&s=hsWeRhH-BhTEaAlrTPbJGlwCV-5Ui7t03Zcec9kywOA&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Tue Sep 26 10:59:13 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Tue, 26 Sep 2017 09:59:13 +0000 Subject: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? In-Reply-To: References: <1506350132.352.17.camel@imperial.ac.uk> Message-ID: There isn?t a default ACL being applied to the export at all now, which is fine, but it differs from the behaviour in 4.2.3 PTF2. Thanks Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 26 September 2017 09:22 To: gpfsug main discussion list Cc: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? Hi Christof, thanks I?ll try it on a test cluster. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christof Schmitt Sent: 25 September 2017 22:41 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? 4.2.3 PTF4 seems to have a fix for this area. Can you try again with that PTF installed? Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) ----- Original message ----- From: "Sobey, Richard A" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at gpfsug.org" > Cc: Subject: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? Date: Mon, Sep 25, 2017 7:35 AM Hi all, This used to work (removing a SID from an ACL), but doesn't any more. Looks like a bug unless I'm being stupid. [root at cesnode~]# mmsmb exportacl list neuroscience2 --viewsids [neuroscience2] REVISION:1 ACL:S-1-1-0:ALLOWED/FULL [root at cesnode ~] mmsmb exportacl remove neuroscience2 --SID S-1-1-0 mmsmb exportacl remove: Incorrect option: --sid Usage: mmsmb exportacl remove ExportName {Name | --user UserName | --group GroupName | --system SystemName | --SID SID} [--access Access] [--permissions Permissions] [--viewsddl] [--viewsids] [-h|--help] where: Access is one of ALLOWED, DENIED Permissions is one of FULL, CHANGE, READ or any combination of RWXDPO I've tried lower case SID i.e --sid, and specifying --access ALLOWED and --permissions FULL. Omitting the --SID argument entirely simply results in GPFS telling me I must specify a Name or an SID. [root at cesnode~]# mmsmb exportacl remove neuroscience2 --access ALLOWED --permissions FULL [E] The mmsmb exportacl remove command requires a Name or SID. Can anyone see my mistake? Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=5Nn7eUPeYe291x8f39jKybESLKv_W_XtkTkS8fTR-NI&m=c3uTUNFbPTWWcTNUNVVejlQ0xdnhBAfQdouTBlVgnjc&s=hsWeRhH-BhTEaAlrTPbJGlwCV-5Ui7t03Zcec9kywOA&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From christof.schmitt at us.ibm.com Tue Sep 26 21:49:09 2017 From: christof.schmitt at us.ibm.com (Christof Schmitt) Date: Tue, 26 Sep 2017 20:49:09 +0000 Subject: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? In-Reply-To: References: , <1506350132.352.17.camel@imperial.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Wed Sep 27 09:02:51 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 27 Sep 2017 08:02:51 +0000 Subject: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? In-Reply-To: References: , <1506350132.352.17.camel@imperial.ac.uk> Message-ID: I?m sorry, you?re right. I can only assume my brain was looking for an SID entry so when I saw Everyone:ALLOWED/FULL it didn?t process it at all. 4.2.3-4: [root at cesnode ~]# mmsmb exportacl list [testces] ACL:\Everyone:ALLOWED/FULL From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christof Schmitt Sent: 26 September 2017 21:49 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? The default for the "export ACL" is always to allow access to "Everyone", so that the the "export ACL" does not limit access by default, but only the file system ACL. I do not have systems with these code levels at hand, could you show the difference you see between PTF2 and PTF4? Regards, Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) ----- Original message ----- From: "Sobey, Richard A" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: "gpfsug-discuss at gpfsug.org" > Subject: Re: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? Date: Tue, Sep 26, 2017 2:59 AM There isn?t a default ACL being applied to the export at all now, which is fine, but it differs from the behaviour in 4.2.3 PTF2. Thanks Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Sobey, Richard A Sent: 26 September 2017 09:22 To: gpfsug main discussion list > Cc: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? Hi Christof, thanks I?ll try it on a test cluster. Richard From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christof Schmitt Sent: 25 September 2017 22:41 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? 4.2.3 PTF4 seems to have a fix for this area. Can you try again with that PTF installed? Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ christof.schmitt at us.ibm.com || +1-520-799-2469 (T/L: 321-2469) ----- Original message ----- From: "Sobey, Richard A" > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at gpfsug.org" > Cc: Subject: [gpfsug-discuss] mmsmb exportacl remove - syntax changed? Date: Mon, Sep 25, 2017 7:35 AM Hi all, This used to work (removing a SID from an ACL), but doesn't any more. Looks like a bug unless I'm being stupid. [root at cesnode~]# mmsmb exportacl list neuroscience2 --viewsids [neuroscience2] REVISION:1 ACL:S-1-1-0:ALLOWED/FULL [root at cesnode ~] mmsmb exportacl remove neuroscience2 --SID S-1-1-0 mmsmb exportacl remove: Incorrect option: --sid Usage: mmsmb exportacl remove ExportName {Name | --user UserName | --group GroupName | --system SystemName | --SID SID} [--access Access] [--permissions Permissions] [--viewsddl] [--viewsids] [-h|--help] where: Access is one of ALLOWED, DENIED Permissions is one of FULL, CHANGE, READ or any combination of RWXDPO I've tried lower case SID i.e --sid, and specifying --access ALLOWED and --permissions FULL. Omitting the --SID argument entirely simply results in GPFS telling me I must specify a Name or an SID. [root at cesnode~]# mmsmb exportacl remove neuroscience2 --access ALLOWED --permissions FULL [E] The mmsmb exportacl remove command requires a Name or SID. Can anyone see my mistake? Richard _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=5Nn7eUPeYe291x8f39jKybESLKv_W_XtkTkS8fTR-NI&m=c3uTUNFbPTWWcTNUNVVejlQ0xdnhBAfQdouTBlVgnjc&s=hsWeRhH-BhTEaAlrTPbJGlwCV-5Ui7t03Zcec9kywOA&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=5Nn7eUPeYe291x8f39jKybESLKv_W_XtkTkS8fTR-NI&m=B-AqKIRCmLBzoWAhGn7NY-ZASOX25NuP_c_ndE8gy4A&s=S06OD3mbRedYjfwETO8tUnlOjnWT7pOX8nsYX5ebIdA&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From kenneth.waegeman at ugent.be Wed Sep 27 09:16:49 2017 From: kenneth.waegeman at ugent.be (Kenneth Waegeman) Date: Wed, 27 Sep 2017 10:16:49 +0200 Subject: [gpfsug-discuss] el7.4 compatibility Message-ID: Hi, Is there already some information available of gpfs (and protocols) on el7.4 ? Thanks! Kenneth From michael.holliday at crick.ac.uk Wed Sep 27 09:25:58 2017 From: michael.holliday at crick.ac.uk (Michael Holliday) Date: Wed, 27 Sep 2017 08:25:58 +0000 Subject: [gpfsug-discuss] File Quotas vs Inode Limits Message-ID: Hi All, I'm in process of setting up quota for our users. We currently have block quotas per file set, and an inode limit for each inode space. Our users have request more transparency relating to the inode limit as as it is they can't see any information. Are there any disadvantages to implementing file quotas, and increasing the inode limits so that they will not be reached? Michael Michael Holliday HPC Systems Engineer Tel: 0203 796 3167 The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbanister at jumptrading.com Wed Sep 27 14:59:08 2017 From: bbanister at jumptrading.com (Bryan Banister) Date: Wed, 27 Sep 2017 13:59:08 +0000 Subject: [gpfsug-discuss] File Quotas vs Inode Limits In-Reply-To: References: Message-ID: Actually you will get a benefit in that you can set up a callback so that users get alerted when they got over a soft quota. We also set up a fileset quota so that the callback will automatically notify users when they exceed their block and file quotas for their fileset as well. Hope that helps, -Bryan From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Michael Holliday Sent: Wednesday, September 27, 2017 4:26 AM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] File Quotas vs Inode Limits Note: External Email ________________________________ Hi All, I'm in process of setting up quota for our users. We currently have block quotas per file set, and an inode limit for each inode space. Our users have request more transparency relating to the inode limit as as it is they can't see any information. Are there any disadvantages to implementing file quotas, and increasing the inode limits so that they will not be reached? Michael Michael Holliday HPC Systems Engineer Tel: 0203 796 3167 The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Greg.Lehmann at csiro.au Thu Sep 28 00:44:53 2017 From: Greg.Lehmann at csiro.au (Greg.Lehmann at csiro.au) Date: Wed, 27 Sep 2017 23:44:53 +0000 Subject: [gpfsug-discuss] el7.4 compatibility In-Reply-To: References: Message-ID: <0285ed0191fa43ac9b1f1b3e36a1f015@exch1-cdc.nexus.csiro.au> I guess I may as well ask about SLES 12 SP3 as well! TIA. -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Kenneth Waegeman Sent: Wednesday, 27 September 2017 6:17 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] el7.4 compatibility Hi, Is there already some information available of gpfs (and protocols) on el7.4 ? Thanks! Kenneth _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From bbanister at jumptrading.com Thu Sep 28 14:21:34 2017 From: bbanister at jumptrading.com (Bryan Banister) Date: Thu, 28 Sep 2017 13:21:34 +0000 Subject: [gpfsug-discuss] el7.4 compatibility In-Reply-To: <0285ed0191fa43ac9b1f1b3e36a1f015@exch1-cdc.nexus.csiro.au> References: <0285ed0191fa43ac9b1f1b3e36a1f015@exch1-cdc.nexus.csiro.au> Message-ID: Please review this site: https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html Hope that helps, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Greg.Lehmann at csiro.au Sent: Wednesday, September 27, 2017 6:45 PM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] el7.4 compatibility Note: External Email ------------------------------------------------- I guess I may as well ask about SLES 12 SP3 as well! TIA. -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Kenneth Waegeman Sent: Wednesday, 27 September 2017 6:17 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] el7.4 compatibility Hi, Is there already some information available of gpfs (and protocols) on el7.4 ? Thanks! Kenneth _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. From JRLang at uwyo.edu Thu Sep 28 15:18:52 2017 From: JRLang at uwyo.edu (Jeffrey R. Lang) Date: Thu, 28 Sep 2017 14:18:52 +0000 Subject: [gpfsug-discuss] el7.4 compatibility In-Reply-To: References: <0285ed0191fa43ac9b1f1b3e36a1f015@exch1-cdc.nexus.csiro.au> Message-ID: I just tired to build the GPFS GPL module against the latest version of RHEL 7.4 kernel and the build fails. The link below show that it should work. cc kdump.o kdump-kern.o kdump-kern-dwarfs.o -o kdump -lpthread kdump-kern.o: In function `GetOffset': kdump-kern.c:(.text+0x9): undefined reference to `page_offset_base' kdump-kern.o: In function `KernInit': kdump-kern.c:(.text+0x58): undefined reference to `page_offset_base' collect2: error: ld returned 1 exit status make[1]: *** [modules] Error 1 make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux' make: *** [Modules] Error 1 -------------------------------------------------------- mmbuildgpl: Building GPL module failed at Thu Sep 28 08:12:14 MDT 2017. -------------------------------------------------------- mmbuildgpl: Command failed. Examine previous error messages to determine cause. [root at bkupsvr3 ~]# [root at bkupsvr3 ~]# [root at bkupsvr3 ~]# [root at bkupsvr3 ~]# [root at bkupsvr3 ~]# uname -a Linux bkupsvr3.arcc.uwyo.edu 3.10.0-693.2.2.el7.x86_64 #1 SMP Sat Sep 9 03:55:24 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux [root at bkupsvr3 ~]# mmdiag --version === mmdiag: version === Current GPFS build: "4.2.2.3 ". Built on Mar 16 2017 at 11:19:59 In order to use GPFS with RHEL 7.4 I have to use a 7.3 kernel. In my case 514.26.2 If I'm missing something can some one point me in the right direction? -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Bryan Banister Sent: Thursday, September 28, 2017 8:22 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] el7.4 compatibility Please review this site: https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html Hope that helps, -Bryan -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Greg.Lehmann at csiro.au Sent: Wednesday, September 27, 2017 6:45 PM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] el7.4 compatibility Note: External Email ------------------------------------------------- I guess I may as well ask about SLES 12 SP3 as well! TIA. -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Kenneth Waegeman Sent: Wednesday, 27 September 2017 6:17 PM To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] el7.4 compatibility Hi, Is there already some information available of gpfs (and protocols) on el7.4 ? Thanks! Kenneth _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From xhejtman at ics.muni.cz Thu Sep 28 15:22:54 2017 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Thu, 28 Sep 2017 16:22:54 +0200 Subject: [gpfsug-discuss] el7.4 compatibility In-Reply-To: References: <0285ed0191fa43ac9b1f1b3e36a1f015@exch1-cdc.nexus.csiro.au> Message-ID: <20170928142254.xwjvp3qwnilazer7@ics.muni.cz> You need 4.2.3.4 GPFS version and it will work. On Thu, Sep 28, 2017 at 02:18:52PM +0000, Jeffrey R. Lang wrote: > I just tired to build the GPFS GPL module against the latest version of RHEL 7.4 kernel and the build fails. The link below show that it should work. > > cc kdump.o kdump-kern.o kdump-kern-dwarfs.o -o kdump -lpthread > kdump-kern.o: In function `GetOffset': > kdump-kern.c:(.text+0x9): undefined reference to `page_offset_base' > kdump-kern.o: In function `KernInit': > kdump-kern.c:(.text+0x58): undefined reference to `page_offset_base' > collect2: error: ld returned 1 exit status > make[1]: *** [modules] Error 1 > make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux' > make: *** [Modules] Error 1 > -------------------------------------------------------- > mmbuildgpl: Building GPL module failed at Thu Sep 28 08:12:14 MDT 2017. > -------------------------------------------------------- > mmbuildgpl: Command failed. Examine previous error messages to determine cause. > [root at bkupsvr3 ~]# > [root at bkupsvr3 ~]# > [root at bkupsvr3 ~]# > [root at bkupsvr3 ~]# > [root at bkupsvr3 ~]# uname -a > Linux bkupsvr3.arcc.uwyo.edu 3.10.0-693.2.2.el7.x86_64 #1 SMP Sat Sep 9 03:55:24 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux > [root at bkupsvr3 ~]# mmdiag --version > > === mmdiag: version === > Current GPFS build: "4.2.2.3 ". > Built on Mar 16 2017 at 11:19:59 > > In order to use GPFS with RHEL 7.4 I have to use a 7.3 kernel. In my case 514.26.2 > > If I'm missing something can some one point me in the right direction? > > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Bryan Banister > Sent: Thursday, September 28, 2017 8:22 AM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] el7.4 compatibility > > Please review this site: > > https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html > > Hope that helps, > -Bryan > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Greg.Lehmann at csiro.au > Sent: Wednesday, September 27, 2017 6:45 PM > To: gpfsug-discuss at spectrumscale.org > Subject: Re: [gpfsug-discuss] el7.4 compatibility > > Note: External Email > ------------------------------------------------- > > I guess I may as well ask about SLES 12 SP3 as well! TIA. > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Kenneth Waegeman > Sent: Wednesday, 27 September 2017 6:17 PM > To: gpfsug-discuss at spectrumscale.org > Subject: [gpfsug-discuss] el7.4 compatibility > > Hi, > > Is there already some information available of gpfs (and protocols) on > el7.4 ? > > Thanks! > > Kenneth > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > ________________________________ > > Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Luk?? Hejtm?nek From S.J.Thompson at bham.ac.uk Thu Sep 28 15:23:53 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Thu, 28 Sep 2017 14:23:53 +0000 Subject: [gpfsug-discuss] el7.4 compatibility In-Reply-To: References: <0285ed0191fa43ac9b1f1b3e36a1f015@exch1-cdc.nexus.csiro.au> Message-ID: The 7.4 kernels are listed as having been tested by IBM. Having said that, we have clients running 7.4 kernel and its OK, but we are 4.2.3.4efix2, so bump versions... Simon On 28/09/2017, 15:18, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jeffrey R. Lang" wrote: >I just tired to build the GPFS GPL module against the latest version of >RHEL 7.4 kernel and the build fails. The link below show that it should >work. > >cc kdump.o kdump-kern.o kdump-kern-dwarfs.o -o kdump -lpthread >kdump-kern.o: In function `GetOffset': >kdump-kern.c:(.text+0x9): undefined reference to `page_offset_base' >kdump-kern.o: In function `KernInit': >kdump-kern.c:(.text+0x58): undefined reference to `page_offset_base' >collect2: error: ld returned 1 exit status >make[1]: *** [modules] Error 1 >make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux' >make: *** [Modules] Error 1 >-------------------------------------------------------- >mmbuildgpl: Building GPL module failed at Thu Sep 28 08:12:14 MDT 2017. >-------------------------------------------------------- >mmbuildgpl: Command failed. Examine previous error messages to determine >cause. >[root at bkupsvr3 ~]# >[root at bkupsvr3 ~]# >[root at bkupsvr3 ~]# >[root at bkupsvr3 ~]# >[root at bkupsvr3 ~]# uname -a >Linux bkupsvr3.arcc.uwyo.edu 3.10.0-693.2.2.el7.x86_64 #1 SMP Sat Sep 9 >03:55:24 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux >[root at bkupsvr3 ~]# mmdiag --version > >=== mmdiag: version === >Current GPFS build: "4.2.2.3 ". >Built on Mar 16 2017 at 11:19:59 > >In order to use GPFS with RHEL 7.4 I have to use a 7.3 kernel. In my >case 514.26.2 > >If I'm missing something can some one point me in the right direction? > > >-----Original Message----- >From: gpfsug-discuss-bounces at spectrumscale.org >[mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Bryan >Banister >Sent: Thursday, September 28, 2017 8:22 AM >To: gpfsug main discussion list >Subject: Re: [gpfsug-discuss] el7.4 compatibility > >Please review this site: > >https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html > >Hope that helps, >-Bryan > >-----Original Message----- >From: gpfsug-discuss-bounces at spectrumscale.org >[mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of >Greg.Lehmann at csiro.au >Sent: Wednesday, September 27, 2017 6:45 PM >To: gpfsug-discuss at spectrumscale.org >Subject: Re: [gpfsug-discuss] el7.4 compatibility > >Note: External Email >------------------------------------------------- > >I guess I may as well ask about SLES 12 SP3 as well! TIA. > >-----Original Message----- >From: gpfsug-discuss-bounces at spectrumscale.org >[mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Kenneth >Waegeman >Sent: Wednesday, 27 September 2017 6:17 PM >To: gpfsug-discuss at spectrumscale.org >Subject: [gpfsug-discuss] el7.4 compatibility > >Hi, > >Is there already some information available of gpfs (and protocols) on >el7.4 ? > >Thanks! > >Kenneth > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > >________________________________ > >Note: This email is for the confidential use of the named addressee(s) >only and may contain proprietary, confidential or privileged information. >If you are not the intended recipient, you are hereby notified that any >review, dissemination or copying of this email is strictly prohibited, >and to please notify the sender immediately and destroy this email and >any attachments. Email transmission cannot be guaranteed to be secure or >error-free. The Company, therefore, does not make any guarantees as to >the completeness or accuracy of this email or any attachments. This email >is for informational purposes only and does not constitute a >recommendation, offer, request or solicitation of any kind to buy, sell, >subscribe, redeem or perform any type of transaction of a financial >product. >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss From kenneth.waegeman at ugent.be Thu Sep 28 15:36:04 2017 From: kenneth.waegeman at ugent.be (Kenneth Waegeman) Date: Thu, 28 Sep 2017 16:36:04 +0200 Subject: [gpfsug-discuss] el7.4 compatibility In-Reply-To: References: <0285ed0191fa43ac9b1f1b3e36a1f015@exch1-cdc.nexus.csiro.au> Message-ID: <087bdf18-5763-0154-8515-7b9d04e5b302@ugent.be> On 28/09/17 16:23, Simon Thompson (IT Research Support) wrote: > The 7.4 kernels are listed as having been tested by IBM. Hi, Were did you find this? > > Having said that, we have clients running 7.4 kernel and its OK, but we > are 4.2.3.4efix2, so bump versions... Do you have some information about the efix2? Is this for 7.4 ? And where should we find this :-) Thank you! Kenneth > > Simon > > On 28/09/2017, 15:18, "gpfsug-discuss-bounces at spectrumscale.org on behalf > of Jeffrey R. Lang" JRLang at uwyo.edu> wrote: > >> I just tired to build the GPFS GPL module against the latest version of >> RHEL 7.4 kernel and the build fails. The link below show that it should >> work. >> >> cc kdump.o kdump-kern.o kdump-kern-dwarfs.o -o kdump -lpthread >> kdump-kern.o: In function `GetOffset': >> kdump-kern.c:(.text+0x9): undefined reference to `page_offset_base' >> kdump-kern.o: In function `KernInit': >> kdump-kern.c:(.text+0x58): undefined reference to `page_offset_base' >> collect2: error: ld returned 1 exit status >> make[1]: *** [modules] Error 1 >> make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux' >> make: *** [Modules] Error 1 >> -------------------------------------------------------- >> mmbuildgpl: Building GPL module failed at Thu Sep 28 08:12:14 MDT 2017. >> -------------------------------------------------------- >> mmbuildgpl: Command failed. Examine previous error messages to determine >> cause. >> [root at bkupsvr3 ~]# >> [root at bkupsvr3 ~]# >> [root at bkupsvr3 ~]# >> [root at bkupsvr3 ~]# >> [root at bkupsvr3 ~]# uname -a >> Linux bkupsvr3.arcc.uwyo.edu 3.10.0-693.2.2.el7.x86_64 #1 SMP Sat Sep 9 >> 03:55:24 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux >> [root at bkupsvr3 ~]# mmdiag --version >> >> === mmdiag: version === >> Current GPFS build: "4.2.2.3 ". >> Built on Mar 16 2017 at 11:19:59 >> >> In order to use GPFS with RHEL 7.4 I have to use a 7.3 kernel. In my >> case 514.26.2 >> >> If I'm missing something can some one point me in the right direction? >> >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at spectrumscale.org >> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Bryan >> Banister >> Sent: Thursday, September 28, 2017 8:22 AM >> To: gpfsug main discussion list >> Subject: Re: [gpfsug-discuss] el7.4 compatibility >> >> Please review this site: >> >> https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html >> >> Hope that helps, >> -Bryan >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at spectrumscale.org >> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of >> Greg.Lehmann at csiro.au >> Sent: Wednesday, September 27, 2017 6:45 PM >> To: gpfsug-discuss at spectrumscale.org >> Subject: Re: [gpfsug-discuss] el7.4 compatibility >> >> Note: External Email >> ------------------------------------------------- >> >> I guess I may as well ask about SLES 12 SP3 as well! TIA. >> >> -----Original Message----- >> From: gpfsug-discuss-bounces at spectrumscale.org >> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Kenneth >> Waegeman >> Sent: Wednesday, 27 September 2017 6:17 PM >> To: gpfsug-discuss at spectrumscale.org >> Subject: [gpfsug-discuss] el7.4 compatibility >> >> Hi, >> >> Is there already some information available of gpfs (and protocols) on >> el7.4 ? >> >> Thanks! >> >> Kenneth >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> ________________________________ >> >> Note: This email is for the confidential use of the named addressee(s) >> only and may contain proprietary, confidential or privileged information. >> If you are not the intended recipient, you are hereby notified that any >> review, dissemination or copying of this email is strictly prohibited, >> and to please notify the sender immediately and destroy this email and >> any attachments. Email transmission cannot be guaranteed to be secure or >> error-free. The Company, therefore, does not make any guarantees as to >> the completeness or accuracy of this email or any attachments. This email >> is for informational purposes only and does not constitute a >> recommendation, offer, request or solicitation of any kind to buy, sell, >> subscribe, redeem or perform any type of transaction of a financial >> product. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From S.J.Thompson at bham.ac.uk Thu Sep 28 15:45:25 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Thu, 28 Sep 2017 14:45:25 +0000 Subject: [gpfsug-discuss] el7.4 compatibility In-Reply-To: <087bdf18-5763-0154-8515-7b9d04e5b302@ugent.be> References: <0285ed0191fa43ac9b1f1b3e36a1f015@exch1-cdc.nexus.csiro.au> <087bdf18-5763-0154-8515-7b9d04e5b302@ugent.be> Message-ID: Aren't listed as tested Sorry ... 4.2.3.4 we have used with 7.4 as well, efix2 includes a fix for an AFM issue we have. Simon On 28/09/2017, 15:36, "kenneth.waegeman at ugent.be" wrote: > > >On 28/09/17 16:23, Simon Thompson (IT Research Support) wrote: >> The 7.4 kernels are listed as having been tested by IBM. >Hi, > >Were did you find this? >> >> Having said that, we have clients running 7.4 kernel and its OK, but we >> are 4.2.3.4efix2, so bump versions... >Do you have some information about the efix2? Is this for 7.4 ? And >where should we find this :-) > >Thank you! > >Kenneth > >> >> Simon >> >> On 28/09/2017, 15:18, "gpfsug-discuss-bounces at spectrumscale.org on >>behalf >> of Jeffrey R. Lang" >of >> JRLang at uwyo.edu> wrote: >> >>> I just tired to build the GPFS GPL module against the latest version of >>> RHEL 7.4 kernel and the build fails. The link below show that it >>>should >>> work. >>> >>> cc kdump.o kdump-kern.o kdump-kern-dwarfs.o -o kdump -lpthread >>> kdump-kern.o: In function `GetOffset': >>> kdump-kern.c:(.text+0x9): undefined reference to `page_offset_base' >>> kdump-kern.o: In function `KernInit': >>> kdump-kern.c:(.text+0x58): undefined reference to `page_offset_base' >>> collect2: error: ld returned 1 exit status >>> make[1]: *** [modules] Error 1 >>> make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux' >>> make: *** [Modules] Error 1 >>> -------------------------------------------------------- >>> mmbuildgpl: Building GPL module failed at Thu Sep 28 08:12:14 MDT 2017. >>> -------------------------------------------------------- >>> mmbuildgpl: Command failed. Examine previous error messages to >>>determine >>> cause. >>> [root at bkupsvr3 ~]# >>> [root at bkupsvr3 ~]# >>> [root at bkupsvr3 ~]# >>> [root at bkupsvr3 ~]# >>> [root at bkupsvr3 ~]# uname -a >>> Linux bkupsvr3.arcc.uwyo.edu 3.10.0-693.2.2.el7.x86_64 #1 SMP Sat Sep 9 >>> 03:55:24 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux >>> [root at bkupsvr3 ~]# mmdiag --version >>> >>> === mmdiag: version === >>> Current GPFS build: "4.2.2.3 ". >>> Built on Mar 16 2017 at 11:19:59 >>> >>> In order to use GPFS with RHEL 7.4 I have to use a 7.3 kernel. In my >>> case 514.26.2 >>> >>> If I'm missing something can some one point me in the right direction? >>> >>> >>> -----Original Message----- >>> From: gpfsug-discuss-bounces at spectrumscale.org >>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Bryan >>> Banister >>> Sent: Thursday, September 28, 2017 8:22 AM >>> To: gpfsug main discussion list >>> Subject: Re: [gpfsug-discuss] el7.4 compatibility >>> >>> Please review this site: >>> >>> >>>https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.ht >>>ml >>> >>> Hope that helps, >>> -Bryan >>> >>> -----Original Message----- >>> From: gpfsug-discuss-bounces at spectrumscale.org >>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of >>> Greg.Lehmann at csiro.au >>> Sent: Wednesday, September 27, 2017 6:45 PM >>> To: gpfsug-discuss at spectrumscale.org >>> Subject: Re: [gpfsug-discuss] el7.4 compatibility >>> >>> Note: External Email >>> ------------------------------------------------- >>> >>> I guess I may as well ask about SLES 12 SP3 as well! TIA. >>> >>> -----Original Message----- >>> From: gpfsug-discuss-bounces at spectrumscale.org >>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Kenneth >>> Waegeman >>> Sent: Wednesday, 27 September 2017 6:17 PM >>> To: gpfsug-discuss at spectrumscale.org >>> Subject: [gpfsug-discuss] el7.4 compatibility >>> >>> Hi, >>> >>> Is there already some information available of gpfs (and protocols) on >>> el7.4 ? >>> >>> Thanks! >>> >>> Kenneth >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> >>> ________________________________ >>> >>> Note: This email is for the confidential use of the named addressee(s) >>> only and may contain proprietary, confidential or privileged >>>information. >>> If you are not the intended recipient, you are hereby notified that any >>> review, dissemination or copying of this email is strictly prohibited, >>> and to please notify the sender immediately and destroy this email and >>> any attachments. Email transmission cannot be guaranteed to be secure >>>or >>> error-free. The Company, therefore, does not make any guarantees as to >>> the completeness or accuracy of this email or any attachments. This >>>email >>> is for informational purposes only and does not constitute a >>> recommendation, offer, request or solicitation of any kind to buy, >>>sell, >>> subscribe, redeem or perform any type of transaction of a financial >>> product. >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > From aaron.s.knister at nasa.gov Fri Sep 29 02:59:39 2017 From: aaron.s.knister at nasa.gov (Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]) Date: Fri, 29 Sep 2017 01:59:39 +0000 Subject: [gpfsug-discuss] Latest recommended 4.2 efix? Message-ID: Hi Everyone, What?s the latest recommended efix release for 4.2.3.4? I?m working on testing a 4.1 to 4.2 migration and was reminded today of some fun bugs in 4.2.3.4 for which I think there are efixes. Alternatively, any word on a 4.2.3.5 release date? -Aaron -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.hearns at asml.com Fri Sep 29 10:02:26 2017 From: john.hearns at asml.com (John Hearns) Date: Fri, 29 Sep 2017 09:02:26 +0000 Subject: [gpfsug-discuss] el7.4 compatibility In-Reply-To: References: <0285ed0191fa43ac9b1f1b3e36a1f015@exch1-cdc.nexus.csiro.au> <087bdf18-5763-0154-8515-7b9d04e5b302@ugent.be> Message-ID: Simon, I would appreciate a heads up on that AFM issue. I upgraded to 4.2.3.4 this morning, to deal with an AFM issue, which is if a remote NFS mount goes down then an asynchronous operation such as a read can be stopped. I must admit to being not clued up on how the efixes are distributed. I downloaded the 4.2.3.4 installer for Linux yesterday. Should I be searching for additional fix packs on top of that (which I am in fact doing now). John H -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (IT Research Support) Sent: Thursday, September 28, 2017 4:45 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] el7.4 compatibility Aren't listed as tested Sorry ... 4.2.3.4 we have used with 7.4 as well, efix2 includes a fix for an AFM issue we have. Simon On 28/09/2017, 15:36, "kenneth.waegeman at ugent.be" wrote: > > >On 28/09/17 16:23, Simon Thompson (IT Research Support) wrote: >> The 7.4 kernels are listed as having been tested by IBM. >Hi, > >Were did you find this? >> >> Having said that, we have clients running 7.4 kernel and its OK, but >> we are 4.2.3.4efix2, so bump versions... >Do you have some information about the efix2? Is this for 7.4 ? And >where should we find this :-) > >Thank you! > >Kenneth > >> >> Simon >> >> On 28/09/2017, 15:18, "gpfsug-discuss-bounces at spectrumscale.org on >>behalf of Jeffrey R. Lang" >on behalf of JRLang at uwyo.edu> wrote: >> >>> I just tired to build the GPFS GPL module against the latest version >>>of RHEL 7.4 kernel and the build fails. The link below show that it >>>should work. >>> >>> cc kdump.o kdump-kern.o kdump-kern-dwarfs.o -o kdump -lpthread >>> kdump-kern.o: In function `GetOffset': >>> kdump-kern.c:(.text+0x9): undefined reference to `page_offset_base' >>> kdump-kern.o: In function `KernInit': >>> kdump-kern.c:(.text+0x58): undefined reference to `page_offset_base' >>> collect2: error: ld returned 1 exit status >>> make[1]: *** [modules] Error 1 >>> make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux' >>> make: *** [Modules] Error 1 >>> -------------------------------------------------------- >>> mmbuildgpl: Building GPL module failed at Thu Sep 28 08:12:14 MDT 2017. >>> -------------------------------------------------------- >>> mmbuildgpl: Command failed. Examine previous error messages to >>>determine cause. >>> [root at bkupsvr3 ~]# >>> [root at bkupsvr3 ~]# >>> [root at bkupsvr3 ~]# >>> [root at bkupsvr3 ~]# >>> [root at bkupsvr3 ~]# uname -a >>> Linux bkupsvr3.arcc.uwyo.edu 3.10.0-693.2.2.el7.x86_64 #1 SMP Sat >>>Sep 9 >>> 03:55:24 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux >>> [root at bkupsvr3 ~]# mmdiag --version >>> >>> === mmdiag: version === >>> Current GPFS build: "4.2.2.3 ". >>> Built on Mar 16 2017 at 11:19:59 >>> >>> In order to use GPFS with RHEL 7.4 I have to use a 7.3 kernel. In >>> my case 514.26.2 >>> >>> If I'm missing something can some one point me in the right direction? >>> >>> >>> -----Original Message----- >>> From: gpfsug-discuss-bounces at spectrumscale.org >>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Bryan >>> Banister >>> Sent: Thursday, September 28, 2017 8:22 AM >>> To: gpfsug main discussion list >>> Subject: Re: [gpfsug-discuss] el7.4 compatibility >>> >>> Please review this site: >>> >>> >>>https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fww >>>w.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSTXKQY%2Fgpfsclustersfaq >>>.ht&data=01%7C01%7Cjohn.hearns%40asml.com%7C1c91f855bc124c31f81a08d50 >>>67f949a%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=nK6KEzCD62kU3njL >>>kIFKL69V3jyN836K5pHMX19tWk8%3D&reserved=0 >>>ml >>> >>> Hope that helps, >>> -Bryan >>> >>> -----Original Message----- >>> From: gpfsug-discuss-bounces at spectrumscale.org >>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of >>> Greg.Lehmann at csiro.au >>> Sent: Wednesday, September 27, 2017 6:45 PM >>> To: gpfsug-discuss at spectrumscale.org >>> Subject: Re: [gpfsug-discuss] el7.4 compatibility >>> >>> Note: External Email >>> ------------------------------------------------- >>> >>> I guess I may as well ask about SLES 12 SP3 as well! TIA. >>> >>> -----Original Message----- >>> From: gpfsug-discuss-bounces at spectrumscale.org >>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of >>> Kenneth Waegeman >>> Sent: Wednesday, 27 September 2017 6:17 PM >>> To: gpfsug-discuss at spectrumscale.org >>> Subject: [gpfsug-discuss] el7.4 compatibility >>> >>> Hi, >>> >>> Is there already some information available of gpfs (and protocols) >>> on >>> el7.4 ? >>> >>> Thanks! >>> >>> Kenneth >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgp >>> fsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.h >>> earns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944e >>> b2a39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Y >>> tqc6pw%3D&reserved=0 _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgp >>> fsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.h >>> earns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944e >>> b2a39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Y >>> tqc6pw%3D&reserved=0 >>> >>> >>> ________________________________ >>> >>> Note: This email is for the confidential use of the named >>>addressee(s) only and may contain proprietary, confidential or >>>privileged information. >>> If you are not the intended recipient, you are hereby notified that >>>any review, dissemination or copying of this email is strictly >>>prohibited, and to please notify the sender immediately and destroy >>>this email and any attachments. Email transmission cannot be >>>guaranteed to be secure or error-free. The Company, therefore, does >>>not make any guarantees as to the completeness or accuracy of this >>>email or any attachments. This email is for informational purposes >>>only and does not constitute a recommendation, offer, request or >>>solicitation of any kind to buy, sell, subscribe, redeem or perform >>>any type of transaction of a financial product. >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> >>>https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpf >>>sug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hea >>>rns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944eb2a >>>39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Ytqc6 >>>pw%3D&reserved=0 _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> >>>https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpf >>>sug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hea >>>rns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944eb2a >>>39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Ytqc6 >>>pw%3D&reserved=0 >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpf >> sug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hea >> rns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944eb2a >> 39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Ytqc6 >> pw%3D&reserved=0 > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hearns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Ytqc6pw%3D&reserved=0 -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. From r.sobey at imperial.ac.uk Fri Sep 29 10:04:49 2017 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Fri, 29 Sep 2017 09:04:49 +0000 Subject: [gpfsug-discuss] el7.4 compatibility In-Reply-To: References: <0285ed0191fa43ac9b1f1b3e36a1f015@exch1-cdc.nexus.csiro.au> <087bdf18-5763-0154-8515-7b9d04e5b302@ugent.be> Message-ID: Efixes (in my one time only limited experience!) come direct from IBM as a result of a PMR. Richard -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of John Hearns Sent: 29 September 2017 10:02 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] el7.4 compatibility Simon, I would appreciate a heads up on that AFM issue. I upgraded to 4.2.3.4 this morning, to deal with an AFM issue, which is if a remote NFS mount goes down then an asynchronous operation such as a read can be stopped. I must admit to being not clued up on how the efixes are distributed. I downloaded the 4.2.3.4 installer for Linux yesterday. Should I be searching for additional fix packs on top of that (which I am in fact doing now). John H -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (IT Research Support) Sent: Thursday, September 28, 2017 4:45 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] el7.4 compatibility Aren't listed as tested Sorry ... 4.2.3.4 we have used with 7.4 as well, efix2 includes a fix for an AFM issue we have. Simon On 28/09/2017, 15:36, "kenneth.waegeman at ugent.be" wrote: > > >On 28/09/17 16:23, Simon Thompson (IT Research Support) wrote: >> The 7.4 kernels are listed as having been tested by IBM. >Hi, > >Were did you find this? >> >> Having said that, we have clients running 7.4 kernel and its OK, but >> we are 4.2.3.4efix2, so bump versions... >Do you have some information about the efix2? Is this for 7.4 ? And >where should we find this :-) > >Thank you! > >Kenneth > >> >> Simon >> >> On 28/09/2017, 15:18, "gpfsug-discuss-bounces at spectrumscale.org on >>behalf of Jeffrey R. Lang" >on behalf of JRLang at uwyo.edu> wrote: >> >>> I just tired to build the GPFS GPL module against the latest version >>>of RHEL 7.4 kernel and the build fails. The link below show that it >>>should work. >>> >>> cc kdump.o kdump-kern.o kdump-kern-dwarfs.o -o kdump -lpthread >>> kdump-kern.o: In function `GetOffset': >>> kdump-kern.c:(.text+0x9): undefined reference to `page_offset_base' >>> kdump-kern.o: In function `KernInit': >>> kdump-kern.c:(.text+0x58): undefined reference to `page_offset_base' >>> collect2: error: ld returned 1 exit status >>> make[1]: *** [modules] Error 1 >>> make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux' >>> make: *** [Modules] Error 1 >>> -------------------------------------------------------- >>> mmbuildgpl: Building GPL module failed at Thu Sep 28 08:12:14 MDT 2017. >>> -------------------------------------------------------- >>> mmbuildgpl: Command failed. Examine previous error messages to >>>determine cause. >>> [root at bkupsvr3 ~]# >>> [root at bkupsvr3 ~]# >>> [root at bkupsvr3 ~]# >>> [root at bkupsvr3 ~]# >>> [root at bkupsvr3 ~]# uname -a >>> Linux bkupsvr3.arcc.uwyo.edu 3.10.0-693.2.2.el7.x86_64 #1 SMP Sat >>>Sep 9 >>> 03:55:24 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux >>> [root at bkupsvr3 ~]# mmdiag --version >>> >>> === mmdiag: version === >>> Current GPFS build: "4.2.2.3 ". >>> Built on Mar 16 2017 at 11:19:59 >>> >>> In order to use GPFS with RHEL 7.4 I have to use a 7.3 kernel. In >>> my case 514.26.2 >>> >>> If I'm missing something can some one point me in the right direction? >>> >>> >>> -----Original Message----- >>> From: gpfsug-discuss-bounces at spectrumscale.org >>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Bryan >>> Banister >>> Sent: Thursday, September 28, 2017 8:22 AM >>> To: gpfsug main discussion list >>> Subject: Re: [gpfsug-discuss] el7.4 compatibility >>> >>> Please review this site: >>> >>> >>>https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fww >>>w.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSTXKQY%2Fgpfsclustersfaq >>>.ht&data=01%7C01%7Cjohn.hearns%40asml.com%7C1c91f855bc124c31f81a08d50 >>>67f949a%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=nK6KEzCD62kU3njL >>>kIFKL69V3jyN836K5pHMX19tWk8%3D&reserved=0 >>>ml >>> >>> Hope that helps, >>> -Bryan >>> >>> -----Original Message----- >>> From: gpfsug-discuss-bounces at spectrumscale.org >>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of >>> Greg.Lehmann at csiro.au >>> Sent: Wednesday, September 27, 2017 6:45 PM >>> To: gpfsug-discuss at spectrumscale.org >>> Subject: Re: [gpfsug-discuss] el7.4 compatibility >>> >>> Note: External Email >>> ------------------------------------------------- >>> >>> I guess I may as well ask about SLES 12 SP3 as well! TIA. >>> >>> -----Original Message----- >>> From: gpfsug-discuss-bounces at spectrumscale.org >>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of >>> Kenneth Waegeman >>> Sent: Wednesday, 27 September 2017 6:17 PM >>> To: gpfsug-discuss at spectrumscale.org >>> Subject: [gpfsug-discuss] el7.4 compatibility >>> >>> Hi, >>> >>> Is there already some information available of gpfs (and protocols) >>> on >>> el7.4 ? >>> >>> Thanks! >>> >>> Kenneth >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgp >>> fsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.h >>> earns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944e >>> b2a39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Y >>> tqc6pw%3D&reserved=0 _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgp >>> fsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.h >>> earns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944e >>> b2a39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Y >>> tqc6pw%3D&reserved=0 >>> >>> >>> ________________________________ >>> >>> Note: This email is for the confidential use of the named >>>addressee(s) only and may contain proprietary, confidential or >>>privileged information. >>> If you are not the intended recipient, you are hereby notified that >>>any review, dissemination or copying of this email is strictly >>>prohibited, and to please notify the sender immediately and destroy >>>this email and any attachments. Email transmission cannot be >>>guaranteed to be secure or error-free. The Company, therefore, does >>>not make any guarantees as to the completeness or accuracy of this >>>email or any attachments. This email is for informational purposes >>>only and does not constitute a recommendation, offer, request or >>>solicitation of any kind to buy, sell, subscribe, redeem or perform >>>any type of transaction of a financial product. >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> >>>https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpf >>>sug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hea >>>rns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944eb2a >>>39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Ytqc6 >>>pw%3D&reserved=0 _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> >>>https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpf >>>sug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hea >>>rns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944eb2a >>>39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Ytqc6 >>>pw%3D&reserved=0 >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpf >> sug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hea >> rns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944eb2a >> 39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Ytqc6 >> pw%3D&reserved=0 > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hearns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Ytqc6pw%3D&reserved=0 -- The information contained in this communication and any attachments is confidential and may be privileged, and is for the sole use of the intended recipient(s). Any unauthorized review, use, disclosure or distribution is prohibited. Unless explicitly stated otherwise in the body of this communication or the attachment thereto (if any), the information is provided on an AS-IS basis without any express or implied warranties or liabilities. To the extent you are relying on this information, you are doing so at your own risk. If you are not the intended recipient, please notify the sender immediately by replying to this message and destroy all copies of this message and any attachments. Neither the sender nor the company/group of companies he or she represents shall be liable for the proper and complete transmission of the information contained in this communication, or for any delay in its receipt. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From S.J.Thompson at bham.ac.uk Fri Sep 29 10:39:43 2017 From: S.J.Thompson at bham.ac.uk (Simon Thompson (IT Research Support)) Date: Fri, 29 Sep 2017 09:39:43 +0000 Subject: [gpfsug-discuss] el7.4 compatibility In-Reply-To: References: <0285ed0191fa43ac9b1f1b3e36a1f015@exch1-cdc.nexus.csiro.au> <087bdf18-5763-0154-8515-7b9d04e5b302@ugent.be> Message-ID: Correct they some from IBM support. The AFM issue we have (and is fixed in the efix) is if you have client code running on the AFM cache that uses truncate. The AFM write coalescing processing does something funny with it, so the file isn't truncated and then the data you write afterwards isn't copied back to home. We found this with ABAQUS code running on our HPC nodes onto the AFM cache, I.e. At home, the final packed output file from ABAQUS is corrupt as its the "untruncated and then filled" version of the file (so just a big blob of empty data). I would guess that anything using truncate would see the same issue. 4.2.3.x: APAR IV99796 See IBM Flash Alert at: http://www-01.ibm.com/support/docview.wss?uid=ssg1S1010629&myns=s033&mynp=O CSTXKQY&mynp=OCSWJ00&mync=E&cm_sp=s033-_-OCSTXKQY-OCSWJ00-_-E Its remedied in efix2, of course remember that an efix has not gone through a full testing validation cycle (otherwise it would be a PTF), but we have not seen any issues in our environments running 4.2.3.4efix2. Simon On 29/09/2017, 10:04, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Sobey, Richard A" wrote: >Efixes (in my one time only limited experience!) come direct from IBM as >a result of a PMR. >Richard > >-----Original Message----- >From: gpfsug-discuss-bounces at spectrumscale.org >[mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of John Hearns >Sent: 29 September 2017 10:02 >To: gpfsug main discussion list >Subject: Re: [gpfsug-discuss] el7.4 compatibility > >Simon, >I would appreciate a heads up on that AFM issue. >I upgraded to 4.2.3.4 this morning, to deal with an AFM issue, which is >if a remote NFS mount goes down then an asynchronous operation such as a >read can be stopped. > >I must admit to being not clued up on how the efixes are distributed. I >downloaded the 4.2.3.4 installer for Linux yesterday. >Should I be searching for additional fix packs on top of that (which I am >in fact doing now). > >John H > > > > > >-----Original Message----- >From: gpfsug-discuss-bounces at spectrumscale.org >[mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon >Thompson (IT Research Support) >Sent: Thursday, September 28, 2017 4:45 PM >To: gpfsug main discussion list >Subject: Re: [gpfsug-discuss] el7.4 compatibility > > >Aren't listed as tested > >Sorry ... >4.2.3.4 we have used with 7.4 as well, efix2 includes a fix for an AFM >issue we have. > >Simon > >On 28/09/2017, 15:36, "kenneth.waegeman at ugent.be" > wrote: > >> >> >>On 28/09/17 16:23, Simon Thompson (IT Research Support) wrote: >>> The 7.4 kernels are listed as having been tested by IBM. >>Hi, >> >>Were did you find this? >>> >>> Having said that, we have clients running 7.4 kernel and its OK, but >>> we are 4.2.3.4efix2, so bump versions... >>Do you have some information about the efix2? Is this for 7.4 ? And >>where should we find this :-) >> >>Thank you! >> >>Kenneth >> >>> >>> Simon >>> >>> On 28/09/2017, 15:18, "gpfsug-discuss-bounces at spectrumscale.org on >>>behalf of Jeffrey R. Lang" >>on behalf of JRLang at uwyo.edu> wrote: >>> >>>> I just tired to build the GPFS GPL module against the latest version >>>>of RHEL 7.4 kernel and the build fails. The link below show that it >>>>should work. >>>> >>>> cc kdump.o kdump-kern.o kdump-kern-dwarfs.o -o kdump -lpthread >>>> kdump-kern.o: In function `GetOffset': >>>> kdump-kern.c:(.text+0x9): undefined reference to `page_offset_base' >>>> kdump-kern.o: In function `KernInit': >>>> kdump-kern.c:(.text+0x58): undefined reference to `page_offset_base' >>>> collect2: error: ld returned 1 exit status >>>> make[1]: *** [modules] Error 1 >>>> make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux' >>>> make: *** [Modules] Error 1 >>>> -------------------------------------------------------- >>>> mmbuildgpl: Building GPL module failed at Thu Sep 28 08:12:14 MDT >>>>2017. >>>> -------------------------------------------------------- >>>> mmbuildgpl: Command failed. Examine previous error messages to >>>>determine cause. >>>> [root at bkupsvr3 ~]# >>>> [root at bkupsvr3 ~]# >>>> [root at bkupsvr3 ~]# >>>> [root at bkupsvr3 ~]# >>>> [root at bkupsvr3 ~]# uname -a >>>> Linux bkupsvr3.arcc.uwyo.edu 3.10.0-693.2.2.el7.x86_64 #1 SMP Sat >>>>Sep 9 >>>> 03:55:24 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux >>>> [root at bkupsvr3 ~]# mmdiag --version >>>> >>>> === mmdiag: version === >>>> Current GPFS build: "4.2.2.3 ". >>>> Built on Mar 16 2017 at 11:19:59 >>>> >>>> In order to use GPFS with RHEL 7.4 I have to use a 7.3 kernel. In >>>> my case 514.26.2 >>>> >>>> If I'm missing something can some one point me in the right direction? >>>> >>>> >>>> -----Original Message----- >>>> From: gpfsug-discuss-bounces at spectrumscale.org >>>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Bryan >>>> Banister >>>> Sent: Thursday, September 28, 2017 8:22 AM >>>> To: gpfsug main discussion list >>>> Subject: Re: [gpfsug-discuss] el7.4 compatibility >>>> >>>> Please review this site: >>>> >>>> >>>>https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fww >>>>w.ibm.com%2Fsupport%2Fknowledgecenter%2Fen%2FSTXKQY%2Fgpfsclustersfaq >>>>.ht&data=01%7C01%7Cjohn.hearns%40asml.com%7C1c91f855bc124c31f81a08d50 >>>>67f949a%7Caf73baa8f5944eb2a39d93e96cad61fc%7C1&sdata=nK6KEzCD62kU3njL >>>>kIFKL69V3jyN836K5pHMX19tWk8%3D&reserved=0 >>>>ml >>>> >>>> Hope that helps, >>>> -Bryan >>>> >>>> -----Original Message----- >>>> From: gpfsug-discuss-bounces at spectrumscale.org >>>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of >>>> Greg.Lehmann at csiro.au >>>> Sent: Wednesday, September 27, 2017 6:45 PM >>>> To: gpfsug-discuss at spectrumscale.org >>>> Subject: Re: [gpfsug-discuss] el7.4 compatibility >>>> >>>> Note: External Email >>>> ------------------------------------------------- >>>> >>>> I guess I may as well ask about SLES 12 SP3 as well! TIA. >>>> >>>> -----Original Message----- >>>> From: gpfsug-discuss-bounces at spectrumscale.org >>>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of >>>> Kenneth Waegeman >>>> Sent: Wednesday, 27 September 2017 6:17 PM >>>> To: gpfsug-discuss at spectrumscale.org >>>> Subject: [gpfsug-discuss] el7.4 compatibility >>>> >>>> Hi, >>>> >>>> Is there already some information available of gpfs (and protocols) >>>> on >>>> el7.4 ? >>>> >>>> Thanks! >>>> >>>> Kenneth >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgp >>>> fsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.h >>>> earns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944e >>>> b2a39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Y >>>> tqc6pw%3D&reserved=0 _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgp >>>> fsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.h >>>> earns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944e >>>> b2a39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Y >>>> tqc6pw%3D&reserved=0 >>>> >>>> >>>> ________________________________ >>>> >>>> Note: This email is for the confidential use of the named >>>>addressee(s) only and may contain proprietary, confidential or >>>>privileged information. >>>> If you are not the intended recipient, you are hereby notified that >>>>any review, dissemination or copying of this email is strictly >>>>prohibited, and to please notify the sender immediately and destroy >>>>this email and any attachments. Email transmission cannot be >>>>guaranteed to be secure or error-free. The Company, therefore, does >>>>not make any guarantees as to the completeness or accuracy of this >>>>email or any attachments. This email is for informational purposes >>>>only and does not constitute a recommendation, offer, request or >>>>solicitation of any kind to buy, sell, subscribe, redeem or perform >>>>any type of transaction of a financial product. >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> >>>>https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpf >>>>sug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hea >>>>rns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944eb2a >>>>39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Ytqc6 >>>>pw%3D&reserved=0 _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> >>>>https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpf >>>>sug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hea >>>>rns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944eb2a >>>>39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Ytqc6 >>>>pw%3D&reserved=0 >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpf >>> sug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hea >>> rns%40asml.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944eb2a >>> 39d93e96cad61fc%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Ytqc6 >>> pw%3D&reserved=0 >> > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.o >rg%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=01%7C01%7Cjohn.hearns%40asml >.com%7C1c91f855bc124c31f81a08d5067f949a%7Caf73baa8f5944eb2a39d93e96cad61fc >%7C1&sdata=NFMrLpW8bakClzmF4zCC%2BUb2oi04Qw3N6cc2Ytqc6pw%3D&reserved=0 >-- The information contained in this communication and any attachments is >confidential and may be privileged, and is for the sole use of the >intended recipient(s). Any unauthorized review, use, disclosure or >distribution is prohibited. Unless explicitly stated otherwise in the >body of this communication or the attachment thereto (if any), the >information is provided on an AS-IS basis without any express or implied >warranties or liabilities. To the extent you are relying on this >information, you are doing so at your own risk. If you are not the >intended recipient, please notify the sender immediately by replying to >this message and destroy all copies of this message and any attachments. >Neither the sender nor the company/group of companies he or she >represents shall be liable for the proper and complete transmission of >the information contained in this communication, or for any delay in its >receipt. >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss From scale at us.ibm.com Fri Sep 29 13:26:51 2017 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Fri, 29 Sep 2017 07:26:51 -0500 Subject: [gpfsug-discuss] Latest recommended 4.2 efix? In-Reply-To: References: Message-ID: There isn't a "recommended" efix as such. Generally, fixes go into the next ptf so that they go through a test cycle. If a customer hits a serious issue that cannot wait for the next ptf, they can request an efix be built, but since efixes do not get the same level of rigorous testing as a ptf, they are not generally recommended unless you report an issue and service determines you need it. To address your other questions: We are currently up to efix3 on 4.2.3.4 We don't announce PTF dates, because they depend upon the testing; however, you can see that we generally release a PTF roughly every 6 weeks and I believe ptf4 was out on 8/24 Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: "Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]" To: "discussion, gpfsug main" Date: 09/28/2017 08:59 PM Subject: [gpfsug-discuss] Latest recommended 4.2 efix? Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi Everyone, What?s the latest recommended efix release for 4.2.3.4? I?m working on testing a 4.1 to 4.2 migration and was reminded today of some fun bugs in 4.2.3.4 for which I think there are efixes. Alternatively, any word on a 4.2.3.5 release date? -Aaron _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=IVcYH9EDg-UaA4Jt2GbsxN5XN1XbvejXTX0gAzNxtpM&s=9SmogyyA6QNSWxlZrpE-vBbslts0UexwJwPzp78LgKs&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From sandeep.patil at in.ibm.com Sat Sep 30 05:02:22 2017 From: sandeep.patil at in.ibm.com (Sandeep Ramesh) Date: Sat, 30 Sep 2017 09:32:22 +0530 Subject: [gpfsug-discuss] Spectrum Scale Enablement Material - 1H 2017 Message-ID: Hi Folks I was asked by Doris Conti to send the below to our Spectrum Scale User group. Below is a consolidated link that list all the enablement on Spectrum Scale/ESS that was done in 1H 2017 - which have blogs and videos from development and offering management. https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media Do note, Spectrum Scale developers keep blogging on the below site which is worth bookmarking: https://developer.ibm.com/storage/blog/ (as recent as 4 new blogs in Sept) Thanks Sandeep Linkedin: https://www.linkedin.com/in/sandeeprpatil Spectrum Scale Dev. -------------- next part -------------- An HTML attachment was scrubbed... URL: